Comment by xianshou

8 months ago

The path to AGI:

0. Have model A.

1. Use Monte Carlo with A to get supervised data.

2. Train model B with data from A.

3. Use Monte Carlo with B to get supervised data.

4. Train model C with data from B...

5 comments

xianshou

This is pretty close to how AlphaZero works.

That's basically how OpenAI is working. They use generated training sets from one model to train the next model (plus other stuff with it).

But the "other stuff" is pretty important. That is what pulls it away from just constantly re-amplifying the bias in the initial training data.

That is an awesome idea. I wish the authors would open source the code and weights so this can be tried.

bigmadshoe 8 months ago

They basically just described alphaZero, the difference being that alphaZero uses MCTS during inference too.