← Back to context

Comment by janalsncm

8 months ago

We don’t know if it’s using implicit search either. While it would be interesting if the network was doing some internal search, it’s also possible it has just memorized the evaluations from 10M games and is performing some function of the similarity of the input to those previously seen.

Maybe they could test it on "tricky" positions where increasing the search depth on Stockfish dramatically changes the evaluation.

Even if it's "implicit" I'm not sure if that matters that much. The point is that the model doesn't explicitly search anything, it just applies the learned transformation. If the weights of the learned transformation encode a sort of precomputed search and interpolation over the dataset, from an algorithmic perspective this still isn't search (it doesn't enumerate board states or state-action transitions).

>performing some function of the similarity of the input to those previously seen.

This is indeed what transformers do. But obviously it learns some sort of interpolation/extrapolation which lets it do well on board states/games outside the training set.

  • I agree, from a practical perspective what matters is that is distilling stockfish’s search to a certain extent, which could have computational efficiencies. “Without search” just means we’re not doing anything like minimax or MCTS.

If the Transformer was 'just' memorizing, you would expect width scaling to work much better than depth scaling (because width enables memorization much more efficiently), and you also wouldn't expect depth to run into problems, because it's not like memorization is that complex - but it does suggest that it's learning some more complicated algorithm which has issues with vanishing gradients & learning multiple serial steps, and the obvious complicated algorithm to be learning in this context would be an implicit search akin to the MuZero RNN (which, incidentally, doesn't need any symbolic solver like Stockfish to learn superhuman chess from scratch by self-play).

>We don’t know if it’s using implicit search either.

Sure

>it’s also possible it has just memorized the evaluations from 10M games and is performing some function of the similarity of the input to those previously seen.

That's not possible. The possible set of moves in chess is incredibly large and it is incredibly easy to play a game that has diverged from training. a model that has just memorized all evaluations would break within ten or so moves tops much less withstand robust evaluations.

However this model may work exactly and how much or little it relies on search is unknown but it is no doubt a model of the world of chess. https://adamkarvonen.github.io/machine_learning/2024/01/03/c...

  • If it could reliably win a mate in N position without inexplicably blundering, I would be more inclined to buy your search hypothesis. But it doesn’t, which is one of the reasons the authors gave for finishing with stockfish. So whatever it’s doing is clearly lossy which an actual search would not be.

    Neural nets memorize all sorts of things. They memorize ad clicks in high dimensional state spaces. Transformers trained on the whole internet can often reproduce entire texts. It’s lossy, but it’s still memorizing.

    That seems like the simplest explanation for what’s happening here. There’s some sort of lossy memorization, not a search. The fact that the thing it has memorized is the result of a search doesn’t matter.

    • >If it could reliably win a mate in N position without inexplicably blundering, I would be more inclined to buy your search hypothesis.

      I don't have a "search hypothesis". I don't know what strategy the model employs to play. I was simply pointing out that limited search learned by the transformer is not out of the question. Stockfish finishing is not necessary to play chess well above the level a memorization hypothesis makes any sense. This is not the first LLM chess machine.

      >Neural nets memorize all sorts of things. They memorize ad clicks in high dimensional state spaces. Transformers trained on the whole internet can often reproduce entire texts. It’s lossy, but it’s still memorizing.

      Intelligent things memorize. Humans memorize a lot. I never said the model hasn't memorized a fair few things. Many human chess grandmaster memorize openings. What i'm saying is that it's not playing games via memorization any more than a human is doing the same.

      >That seems like the simplest explanation for what’s happening here. There’s some sort of lossy memorization, not a search.

      The options aren't only lossy memorization or lossless search.