Comment by paulddraper

8 months ago

There is rampant misunderstanding of some parts of this article; allow me to help :)

The "no-search" chess engine uses search (Stockfish) in in two ways:

1. To score positions in the training data. This is only training data, no search is performed when actually playing.

2. To play moves when the position has many options with a 99% win rate. This is to prevent pathological behavior in already won positions, and is not "meaningful" in the objective of grandmaster-level play.

Thus, "without search" is a valid description.

In one sense, I can understand why they would choose to use Stockfish in mate-in-N positions. The fact that the model can't distinguish between mate in 5 and mate in 3 is an implementation detail. Since the vast majority of positions are not known to be wins or draws, it's still an interesting finding.

However, in reality all positions are actually wins (for black or white) or draws. One reason they gave for why stockfish is needed to finish the game is because their evaluation function is imperfect, which is also an notable result.

  • Is this in comparison to some other evaluation function which is perfect? I agree that all positions should have a certainty of win, draw, or lose with perfect play, but no engine is close to that level of evaluation function.

    I do suspect that this pathological behavior could be trained out with additional fine tuning, but likely not without slightly diminishing the model's overall ability.

    • It comes down to: What is the evaluation for? For a human using an engine to analyze, it is about getting to more win-likely positions. And for an engine, it really is the same; plus to guide the search. Having a perfect trinary win/draw/loss would certainly be the _truth_ about a position in some objective way, but it would almost certainly not be the optimal way to win chess games against a set opponent. 1. e4 and 1. h3 are almost certainly both draws with perfect play from both sides, but the former is much more likely to net a win, especially for a human using the engine.

Sort of but it seems a bit of a cheat.

Neural networks are universal approximators. Choose a function and get enough data from it and you can approximate it very closely and maybe exactly. If initially creating function F required algorithm Y ("search" or whatever), you can do your approximation to F and then say "Look F without Y" and for all we know, the approximation might be doing things internally that are actually nearly identical to the initial F.

  • A sufficiently large nn can learn an arbitrary function, yes. But stockfish is also theoretically perfect given infinite computational resources.

    What is interesting is performing well under reasonable computational constraints i.e. doing it faster/with fewer flops than stockfish.

    • Is the model more efficient than Stockfish? I think Stockfish runs on regular CPU computer and I'd guess this " 270M parameter transformer model" requires a GPU but I can't find any reference to efficiency in the paper.

      Also found in the paper: "While our largest model achieves very good performance, it does not completely close the gap to Stockfish 16". It's actually inferior but they still think it's an interesting exercise. But that's the thing, it's primarily an exercise like calculating pi to a billion decimal points or overclocking a gaming laptop.

      4 replies →

  • The state space of chess is so huge that even a giant training set would be a _very_ sparse sample of the stockfish-computed value function.

    So the network still needs to do some impressive generalization in order to „interpolate“ between those samples.

    I think so, anyway (didn‘t read the paper but worked on alphazero-like algorithms for a few years)

  • Only on the same way that DNA is a cheat for constructing organisms. After all they are universal recipes for organisms. Study enough DNA sequences and splice together a sequence and you can theoretically design any mythical creature you can think of.

    But if someone grew an actual dragon in a lab by splicing together DNA fragments, that would still be a major feat. Similarly, training a neural net to play grandmaster level chess is simple in theory but extremely difficult in practice.

  • It doesn't get the actual optimal Q values computed from Stockfish (presumably this takes infinite compute to calculate), in fact it gets computed estimates from polling Stockfish for only 50ms.

    So you're estimating from data a function which is itself not necessarily optimal. Moreover, the point is more like how far can we get using a really generic transformer architecture that is not tuned to domain-specific details of our problem, which Stockfish is.

  • No, arbitrarily wide neural networks are approximators of Borel Measurable functions. Big difference between that and “any function”. RNNs are Turing Complete though.

    • You can say the same thing about RNNs. Technically nothing is turing complete without infinite scratch space.

"Aready won position" or "99% win rate" is statistics given by Stockfish (or professional chess player). It is weird to assume that the same statement is true for the trained LLM since we are assessing the LLM itself. If it is using during the game then it is searching, thus the title doesn't reflect the actual work.

  • It's quite clear from the article that the 99% is the model's predicted win rate for a position, not its evaluation by Stockfish (which doesn't return evaluations in those terms).

    It's true that this is a relatively large deficiency in practice: how strong would a player be if he played the middlegame at grandmaster strength but couldn't reliably mate with king and rook?

    The authors overcame the practical problem by just punting to Stockfish in these few cases. However, I think it's clearly solvable with LLM methods too. Their model performs poorly because of an artifact in the training process where mate-in-one is valued as highly as mate-in- fifteen. Train another instance of the model purely on checkmate patterns - it can probably be done with many fewer parameters - and punt to that instead.

    • Human players have this concept of progress. I couldn't give a good succinct description of exactly what that entails, but basically if you are trading off pieces that's progress, if your king is breaking through the defensive formation of the pawn endgame that's progress. If you are pushing your passed pawn up the board that's progress. If you are slowly constricting the other king that's progress.

      When we have a won position we want to progress and convert it to an actual win.

      I think the operational definition I would use for progress is a prediction of how many more moves the game will last. A neural network can be used for that.

>> 1. To score positions in the training data. This is only training data, no search is performed when actually playing.

That's like saying you can have eggs without chickens, because when you make an omelette you don't add chickens. It's completely meaningless and a big fat lie to boot.

The truth is that the system created by DeepMind consists of two components: a search-based system used to annotate a dataset of moves and a neural-net based system that generates moves similar to the ones in the dataset. DeepMind arbitrarily draw the boundary of the system around the neural net component and pretend that because the search is external to the neural net, the neural net doesn't need the search.

And yet, without the search there is no dataset, and without the dataset there is no model. They didn't train their system by self-play and they certainly didn't hire an army of low-paid workers to annotate moves for them. They generated training moves with a search-based system and learned to reproduce them. They used chickens to make eggs.

Their approach depends entirely on there being a powerful chess search engine and they wouldn't be able to create their system without it as a main component. Their "without search" claim is just a marketing term.

  • It doesn't matter where the egg came from, just that it is an egg.

    It could have luckily coalesced from gas (a Boltzmann egg), or perhaps even more radically, been laid by a duck.

    you say

    >They didn't train their system by self-play and they certainly didn't hire an army of low-paid workers to annotate moves for them.

    So you are certainly aware that there are avenues to creating the data set. Given that, it is quite reasonable to say that search is unnecessary.

    • Neither of those has been shown to produce equivalent training data, no.

      They should do one of those instead of using search before they claim it’s possible to not use search.

      Or to borrow your analogy, you’ll need to show me a duck egg to prove you can make omelettes without chickens. Making an omelette from chicken eggs and claiming hypothetically some mystery other animal could have done it is nonsense.

    • >> So you are certainly aware that there are avenues to creating the data set. Given that, it is quite reasonable to say that search is unnecessary.

      How is it unnecessary? They used none of those methods, so they had to use search. That is search being necessary, not the opposite.

    • Bold of you to assume that low-paid (and yet somehow grandmaster level chess playing) workers have never been exposed to search in any fashion.

  • The point -- which I don't think you got -- is that extremely generic ingredients like high-quality data (which is the point of Stockfish here) and very deep Transformer-type Neural Networks, are enough to nearly match the performance of ad-hoc, non-generalisable techniques like gametree search algorithms.

    This has two possible applications: 1. There's far less need to invent techniques like MCTS in the first place. 2. A single AI might be able to play grandmaster level chess by accident.

    The catch is you need high quality data in large amounts.

    • I did get the point and I'm commenting that the point is missing the point. There is nothing new in learning that a large neural net can approximate the output of a classical system. This has been done many times before. The real point is that DeepMind build a system that is half-search and pretend it's no-search. You cannot get the "high-quality data" without a classical system- not in chess.

      1 reply →

  • Btw, just to be a bit more constructive (not by much) the proper term for what DeepMind did is "neuro-symbolic AI". But DeepMind shunned the term even for AlphaGO, a system comprised of a couple of neural nets and Monte-Carlo Tree Search.

    The whole thing is just political: DeepMind use neural nets, GOFAI is dead and that's the way to AI. That's their story and they're sticking with it.

  • It's more like saying you can make omelette without killing chickens, even though chickens were clearly involved at some point. So I see your point, that this doesn't allow grandmaster level chess play with no search at any point, but I also think it's fair to say that this approach allows you to use search to build an agent which can play grandmaster-level chess without, itself, using search.

  • > That's like saying you can have eggs without chickens, because when you make an omelette you don't add chickens.

    I just took it in the same way as saying that being a vegetarian is generally better for animal welfare, as you're not harming chickens as directly by eating an omelette, as you would by eating their wings.

  • > That's like saying you can have eggs without chickens, because when you make an omelette you don't add chickens. It's completely meaningless and a big fat lie to boot.

    It's like saying ChatGPT isn't a human brain.

    It was trained with human brains. But it isn't a human brain.

  • Would it be more fair if they said “only using search one time to process training data” or something like that?