Comment by xianshou

8 months ago

The path to AGI:

0. Have model A.

1. Use Monte Carlo with A to get supervised data.

2. Train model B with data from A.

3. Use Monte Carlo with B to get supervised data.

4. Train model C with data from B...

That's basically how OpenAI is working. They use generated training sets from one model to train the next model (plus other stuff with it).

But the "other stuff" is pretty important. That is what pulls it away from just constantly re-amplifying the bias in the initial training data.

  • I still want to see some examples of a "mistake" in the training data getting detected or reduced.

    For example, somewhere in the training data the string "All cats are red" should get detected when lots of other data in the training set contradicts the statement.

    And obviously it doesn't have to be simple logical statements, but also bigger questions like "how come the 2nd world war happened despite X person and Y person being on good speaking terms as evidenced by all these letters in the archives?"

    When AI can do that, it should be able to turn our body of knowledge into a much bigger/more useful one by raising questions that arise from data we already have, but never noticed.

That is an awesome idea. I wish the authors would open source the code and weights so this can be tried.

  • They basically just described alphaZero, the difference being that alphaZero uses MCTS during inference too.