Grandmaster-Level Chess Without Search

8 months ago (arxiv.org)

There is rampant misunderstanding of some parts of this article; allow me to help :)

The "no-search" chess engine uses search (Stockfish) in in two ways:

1. To score positions in the training data. This is only training data, no search is performed when actually playing.

2. To play moves when the position has many options with a 99% win rate. This is to prevent pathological behavior in already won positions, and is not "meaningful" in the objective of grandmaster-level play.

Thus, "without search" is a valid description.

  • In one sense, I can understand why they would choose to use Stockfish in mate-in-N positions. The fact that the model can't distinguish between mate in 5 and mate in 3 is an implementation detail. Since the vast majority of positions are not known to be wins or draws, it's still an interesting finding.

    However, in reality all positions are actually wins (for black or white) or draws. One reason they gave for why stockfish is needed to finish the game is because their evaluation function is imperfect, which is also an notable result.

    • Is this in comparison to some other evaluation function which is perfect? I agree that all positions should have a certainty of win, draw, or lose with perfect play, but no engine is close to that level of evaluation function.

      I do suspect that this pathological behavior could be trained out with additional fine tuning, but likely not without slightly diminishing the model's overall ability.

      1 reply →

  • Sort of but it seems a bit of a cheat.

    Neural networks are universal approximators. Choose a function and get enough data from it and you can approximate it very closely and maybe exactly. If initially creating function F required algorithm Y ("search" or whatever), you can do your approximation to F and then say "Look F without Y" and for all we know, the approximation might be doing things internally that are actually nearly identical to the initial F.

    • A sufficiently large nn can learn an arbitrary function, yes. But stockfish is also theoretically perfect given infinite computational resources.

      What is interesting is performing well under reasonable computational constraints i.e. doing it faster/with fewer flops than stockfish.

      5 replies →

    • The state space of chess is so huge that even a giant training set would be a _very_ sparse sample of the stockfish-computed value function.

      So the network still needs to do some impressive generalization in order to „interpolate“ between those samples.

      I think so, anyway (didn‘t read the paper but worked on alphazero-like algorithms for a few years)

    • Only on the same way that DNA is a cheat for constructing organisms. After all they are universal recipes for organisms. Study enough DNA sequences and splice together a sequence and you can theoretically design any mythical creature you can think of.

      But if someone grew an actual dragon in a lab by splicing together DNA fragments, that would still be a major feat. Similarly, training a neural net to play grandmaster level chess is simple in theory but extremely difficult in practice.

    • It doesn't get the actual optimal Q values computed from Stockfish (presumably this takes infinite compute to calculate), in fact it gets computed estimates from polling Stockfish for only 50ms.

      So you're estimating from data a function which is itself not necessarily optimal. Moreover, the point is more like how far can we get using a really generic transformer architecture that is not tuned to domain-specific details of our problem, which Stockfish is.

    • No, arbitrarily wide neural networks are approximators of Borel Measurable functions. Big difference between that and “any function”. RNNs are Turing Complete though.

      1 reply →

  • "Aready won position" or "99% win rate" is statistics given by Stockfish (or professional chess player). It is weird to assume that the same statement is true for the trained LLM since we are assessing the LLM itself. If it is using during the game then it is searching, thus the title doesn't reflect the actual work.

    • It's quite clear from the article that the 99% is the model's predicted win rate for a position, not its evaluation by Stockfish (which doesn't return evaluations in those terms).

      It's true that this is a relatively large deficiency in practice: how strong would a player be if he played the middlegame at grandmaster strength but couldn't reliably mate with king and rook?

      The authors overcame the practical problem by just punting to Stockfish in these few cases. However, I think it's clearly solvable with LLM methods too. Their model performs poorly because of an artifact in the training process where mate-in-one is valued as highly as mate-in- fifteen. Train another instance of the model purely on checkmate patterns - it can probably be done with many fewer parameters - and punt to that instead.

      1 reply →

  • >> 1. To score positions in the training data. This is only training data, no search is performed when actually playing.

    That's like saying you can have eggs without chickens, because when you make an omelette you don't add chickens. It's completely meaningless and a big fat lie to boot.

    The truth is that the system created by DeepMind consists of two components: a search-based system used to annotate a dataset of moves and a neural-net based system that generates moves similar to the ones in the dataset. DeepMind arbitrarily draw the boundary of the system around the neural net component and pretend that because the search is external to the neural net, the neural net doesn't need the search.

    And yet, without the search there is no dataset, and without the dataset there is no model. They didn't train their system by self-play and they certainly didn't hire an army of low-paid workers to annotate moves for them. They generated training moves with a search-based system and learned to reproduce them. They used chickens to make eggs.

    Their approach depends entirely on there being a powerful chess search engine and they wouldn't be able to create their system without it as a main component. Their "without search" claim is just a marketing term.

    • It doesn't matter where the egg came from, just that it is an egg.

      It could have luckily coalesced from gas (a Boltzmann egg), or perhaps even more radically, been laid by a duck.

      you say

      >They didn't train their system by self-play and they certainly didn't hire an army of low-paid workers to annotate moves for them.

      So you are certainly aware that there are avenues to creating the data set. Given that, it is quite reasonable to say that search is unnecessary.

      3 replies →

    • The point -- which I don't think you got -- is that extremely generic ingredients like high-quality data (which is the point of Stockfish here) and very deep Transformer-type Neural Networks, are enough to nearly match the performance of ad-hoc, non-generalisable techniques like gametree search algorithms.

      This has two possible applications: 1. There's far less need to invent techniques like MCTS in the first place. 2. A single AI might be able to play grandmaster level chess by accident.

      The catch is you need high quality data in large amounts.

      2 replies →

    • Btw, just to be a bit more constructive (not by much) the proper term for what DeepMind did is "neuro-symbolic AI". But DeepMind shunned the term even for AlphaGO, a system comprised of a couple of neural nets and Monte-Carlo Tree Search.

      The whole thing is just political: DeepMind use neural nets, GOFAI is dead and that's the way to AI. That's their story and they're sticking with it.

      1 reply →

    • It's more like saying you can make omelette without killing chickens, even though chickens were clearly involved at some point. So I see your point, that this doesn't allow grandmaster level chess play with no search at any point, but I also think it's fair to say that this approach allows you to use search to build an agent which can play grandmaster-level chess without, itself, using search.

    • > That's like saying you can have eggs without chickens, because when you make an omelette you don't add chickens.

      I just took it in the same way as saying that being a vegetarian is generally better for animal welfare, as you're not harming chickens as directly by eating an omelette, as you would by eating their wings.

    • > That's like saying you can have eggs without chickens, because when you make an omelette you don't add chickens. It's completely meaningless and a big fat lie to boot.

      It's like saying ChatGPT isn't a human brain.

      It was trained with human brains. But it isn't a human brain.

    • Would it be more fair if they said “only using search one time to process training data” or something like that?

A quote from discord: "apparently alpha-zero has been replicated in open source as leela-zero, and then leela-zero got a bunch of improvements so it's far ahead of alpha-zero. but leela-zero was barely mentioned at all in the paper; it was only dismissed in the introduction and not compared in the benchmarks. in the stockfish discord they are saying that leela zero can already do everything in this paper including using the transformer architecture."

  • Leela zero was an amazing project improving on AlphaZero, showing the feasibility of large scale training with contributed cycles, and snatching the TCEC crown in Season 16

    It forced Stockfish to up its game, essentially by adopting neural techniques themselves (though a different type, Stockfish uses nnue).

How much of this "grandmaster-level" play is an artifact of low time controls? I notice they only achieve GM ELO in Blitz against humans, achieve significantly worse ELO against bots, and do not provide the "Lichess Blitz ELO" of any of their benchmark approaches.

  • All of it? Inference speed is proportional to the number of parameters which can't vary depending on the time control. For longer time controls you'd need a larger network. For those who don't know chess, the quality of high-level play in five-minutes-each games is extremely much lower than in 90-minutes-each games.

  • I wonder if one could make a neural net play human-like, at various levels, by for instance training smaller or larger nets. And by human-like, I don't mean ELO level, but more like the Turing tests - "does this feel like playing against a human?"

    I wonder how many time-annotated chess play logs are out there. (Between humans, I mean.)

    • I suppose varying the neural net size wouldn't be the best way of doing that; very small nets can have very "unhuman-like" behaviour. I'm not an expert on reinforcement learning, but for other fields in deep learning that's typically the case.

      I think that, to simulate worse human-like players, it would be better to just increase the temperature: don't always select the best move, at every step just select one of the top 10, randomly proportional to some function of the model-predicted probability of it being "the best" move (e.g. a power of the probability; very large powers give always the best move, i.e. the strongest player, and powers close to 0 tend to choose uniformly at random, i.e. the weakest player). The only thing I'm not certain about is, if you train the original network well enough, stupid blunders (that a very bad human player like me would make) are still scored so low that there's no way this algorithm will pick them up - the only way to know would be to try.

      1 reply →

    • Maia chess did this. It works ok for low to mid elo levels (around 50% accuracy). But their project also didn’t use any search, it just directly predicted the move. Humans actually do perform search, so a more accurate model at higher elos will probably need to do something like that. However, humans don’t do a complete game search like stockfish, and we don’t do full game rollouts like lc0 either.

While its performance against humans is very impressive indeed, its performance against engines is somewhat less so:

> Our agent’s aggressive style is highly successful against human opponents and achieves a grandmasterlevel Lichess Elo of 2895. However, we ran another instance of the bot and allowed other engines to play it. Its estimated Elo was far lower, i.e., 2299. Its aggressive playing style does not work as well against engines that are adept at tactical calculations, particularly when there is a tactical refutation to a suboptimal move.

  • I like this even more that I’ve read that. That sounds like it makes this agent a much more human-like player than the perfect calculator traditional chess engines. It may end up being more fun for humans to play against if it’s strong but has holes in its play.

  • This sounds a lot like Mikhail Tal!

    • Not really. Mikhail Tal was easily one of the strongest calculators in chess history. Definitely the strongest in his time besides maybe Fischer.

      The idea that Tal mostly made dubious sacrifices is largely a myth heavily based in a joke he himself made. In actual fact he always did deep calculation and knew that no easy refutation existed, and that he had a draw by perpetual check in hand(until beaten by Ding a few years ago, Tal actually had the record streak of unbeaten games in classical chess). He was making calculated risks knowing his opponents would not be likely to outcalculate him. He also had a very deep understanding of positional play, he just had a very different style of expressing it, relying more on positional knowledge to create sharp positions centered around material imbalance.

Well without explicit search would probably be more accurate.

They note that though in the paper:

>Since transformers may learn to roll out iterative computation (which arises in search) across layers, deeper networks may hold the potential for deeper unrolls.

  • We don’t know if it’s using implicit search either. While it would be interesting if the network was doing some internal search, it’s also possible it has just memorized the evaluations from 10M games and is performing some function of the similarity of the input to those previously seen.

    • Maybe they could test it on "tricky" positions where increasing the search depth on Stockfish dramatically changes the evaluation.

    • Even if it's "implicit" I'm not sure if that matters that much. The point is that the model doesn't explicitly search anything, it just applies the learned transformation. If the weights of the learned transformation encode a sort of precomputed search and interpolation over the dataset, from an algorithmic perspective this still isn't search (it doesn't enumerate board states or state-action transitions).

      >performing some function of the similarity of the input to those previously seen.

      This is indeed what transformers do. But obviously it learns some sort of interpolation/extrapolation which lets it do well on board states/games outside the training set.

      1 reply →

    • If the Transformer was 'just' memorizing, you would expect width scaling to work much better than depth scaling (because width enables memorization much more efficiently), and you also wouldn't expect depth to run into problems, because it's not like memorization is that complex - but it does suggest that it's learning some more complicated algorithm which has issues with vanishing gradients & learning multiple serial steps, and the obvious complicated algorithm to be learning in this context would be an implicit search akin to the MuZero RNN (which, incidentally, doesn't need any symbolic solver like Stockfish to learn superhuman chess from scratch by self-play).

    • >We don’t know if it’s using implicit search either.

      Sure

      >it’s also possible it has just memorized the evaluations from 10M games and is performing some function of the similarity of the input to those previously seen.

      That's not possible. The possible set of moves in chess is incredibly large and it is incredibly easy to play a game that has diverged from training. a model that has just memorized all evaluations would break within ten or so moves tops much less withstand robust evaluations.

      However this model may work exactly and how much or little it relies on search is unknown but it is no doubt a model of the world of chess. https://adamkarvonen.github.io/machine_learning/2024/01/03/c...

      2 replies →

Given that they used position evaluation from (a search chess engine[1]) Stockfish, how is this "without search"?

Edit: looking further than the abstract, this is rather an exploration of scale necessary for a strong engine. Could go without "without search" in the title I guess.

[1]: IIRC, it also uses a Leela-inspired NN for evaluation.

  • Leela without search supposedly plays around expert level, but I thought the no-search Leela approach ran out of gas around there. Without search there means evaluating 1 board position per move. The engine in the paper (per the abstract) use a big LLM instead of a Leela style DCNN.

  • Training uses search, but it plays without search.

    ChatGPT isn't human, but it was trained with humans.

    • So it's a space time trade-off then? Store enough searched and weighted positions into the model and infer them. In this way, inference is replacing Stockfish search, just less accurately, but much faster and with memory required for the model.

  • Does Stockfish really use a Leela-inspired NN? I thought the NNUE was independently developed and completely different (it's a very tiny network that runs on the CPU).

    • Yeah, NNUE is a separate invention that unfortunately, Deepmind often get undeserved credit for inspiring. It didn't even originate in chess engines but a shogi version of Stockfish. Architecture is completely different from the nets in Leela or Alpha Zero.

      2 replies →

    • This is true, but at least for a while (I’m not sure if it’s still the case), Leela data was used (along with data generated from Stockfish self-play) to train Stockfish’s NN.

Slightly off topic but am I the only one that approaches strategy games by making a "zeroth order approximation". Eg find the shortest path to victory under the (obviously faulty) assumption that my opponent does nothing and the board is unchanging except for my moves. Now find my opponents shortest path to victory under the same assumption. Then evaluate, if we both just ignore each other and try to bum rush the victory condition, who gets there first?

For most games, if you can see a way to an end state within 3-5 steps under these idealized conditions, there's only so much that an actual opponent can do to make the board deviate from the initial static board state that you used in your assumption. The optimal strategy will always be just a few minor corrections of edit distance from this dumb no-theory-of-mind strategy. You can always be sure that whoever has the longer path to victory has to do something to interfere with the shorter path of their opponent, and there's only ever so many pieces which can interact with that shorter path. Meaning whatever path to victory is currently shortest short circuits the search for potential moves.

  • On beginner level this might work, but if people are more competitive they begin to realize the benefit of not only playing the own game, but reading the enemies plan (e.g. scouting in Starcraft/AoE2) to counteract it as much as possible.

    Chess against humen is different. Usually, there is no path to victory, only to remis. People just follow strategic plans that people told them would be slightly beneficial later on. Along following that strategic plan people mess up and the first one to realize that the opponent messed up usually wins. Like having a piece advantage of 2-3 is usually considered a win already.

    • I agree with this. My default approach to board games is basically to maximize victory points early. This usually works; when 4 people are playing a new game for the first time, I usually win. This doesn't really work when people know how to play the game specifically, though.

      I think this algorithm is better than many other algorithms that people come up with, however.

      (As an aside, when I play a card game I sort my cards with a merge sort instead of an insertion sort. People said you would never use these algorithms in real life, but you can if you want to!)

      2 replies →

    • Common advice to beginners:

      "Improve the position of your worst placed piece."

      For better players, determine which is most effective:

      1 Improve the position of your worst placed piece.

      2 Exchange or minimize the potential of your opponent's best-placed piece (e.g. trade off their bishop on a long diagonal)

      3 Maximize the potential of your best-placed piece (sometimes by a sacrifice)

      4 Prevent your opponent from developing some pieces

      This is over-simplified, of course, but the balance of the position must be taken into account.

  • If you programmed this as a chess strategy, it would probably result in an engine that played the Scholar's mate every game. This is actually close to what low Elo players do in chess, but as you get closer to 800-ish ELO the probability of attempted scholar mates drop dramatically (likely due to it being an opening that isn't that good).

  • This is one of the canonical ways people learn chess. It's not that bad of a way to play because it emphasizes thinking about good moves, and it efficiently finds mates in N (when done by a human)

    In higher level play it usually loses to opponents that are aware they're not playing alone, at least that's the case with bots that do in fact stay unaware of their opponent.

  • A lot of decision in chess would be like "this square would be nice for that piece, how can I get there ?" and then analyze what your opponent can do to prevent you to do that, or what counterplay that gives him. So what you are doing makes a lot of sense.

  • This would never work in chess because there is almost always a checkmate in a few moves under the assumption that the opponent doesn't defend it. So this engine would just play way too aggressively and over-extent their pieces.

    Take scholars mate for example. You can win in 4 moves from the initial position and it is an easy win if the opponent doesn't defend it but playing against someone that knows chess it is a horrible opening because it is easy to defend and leaves you in a weak position.

  • Can't you almost always win the game in a few moves if you plan for your opponent to make really stupid responses?

  • Ehh idk. Sounds like it’s prone to the beginner strategy of assuming your opponent will occasionally do something really dumb.

I guess this current method is not applicable to have the model explain why a given move was played as it is not planning more than one more ahead. Very cool, nonetheless.

I think this is an interesting finding from a practical perspective. A function which can reliably approximate stockfish at a certain depth could replace it, basically "compressing" search to a set depth. And unlike NNUE which is optimized for CPU, a neural network is highly parallelizable on GPU meaning you could send all possible future positions (at depth N) through the network and use the results for a primitive tree search.

They do use Stockfish for playing thought …

“To prevent some of these situations, we check whether the predicted scores for all top five moves lie above a win percentage of 99% and double-check this condition with Stockfish, and if so, use Stockfish’s top move (out of these) to have consistency in strategy across time-steps.”

  • The context of that sentence:

    > Indecisiveness in the face of overwhelming victory

    > If Stockfish detects a mate-in-k (e.g., 3 or 5) it outputs k and not a centipawn score. We map all such outputs to the maximal value bin (i.e., a win percentage of 100%). Similarly, in a very strong position, several actions may end up in the maximum value bin. Thus, across time-steps this can lead to our agent playing somewhat randomly, rather than committing to one plan that finishes the game quickly (the agent has no knowledge of its past moves). This creates the paradoxical situation that our bot, despite being in a position of overwhelming win percentage, fails to take the (virtually) guaranteed win and might draw or even end up losing since small chances of a mistake accumulate with longer games (see Figure 4). To prevent some of these situations, we check whether the predicted scores for all top five moves lie above a win percentage of 99% and double-check this condition with Stockfish, and if so, use Stockfish’s top move (out of these) to have consistency in strategy across time-steps.

    • They should try to implement some kind of resolute agent in that case. Might be hard to do if it needs to be "not technically search" though.

  • But only to complete a winning position.

    • That's a crucial part of chess that can't simply be swept under the rug. If I had won all the winning positions I've had over the years I'd be hundreds of points higher rated.

      What if a human only used Stockfish in winning positions? Is it cheating? Obviously it is.

      1 reply →

    • The process of converting a completely winning position (typically one with a large material advantage) is a phase change relative to normal play which is the struggle to achieve such a position. In other words you are doing something different at that point. For example, me as weak FIDE CM (Candidate Master) could not compete with a top grandmaster in a game of chess, but I could finish off a trivial win.

      Edit: Recently I brought some ancient (1978) chess software back to life https://github.com/billforsternz/retro-sargon. These two phases of chess, basically two different games, were quite noticeable with that program, which is chess software stripped back to the bone. Sargon 1978 could play decently well, but it absolutely did not have the technique to convert winning positions (because this is different challenge to regular chess). For example, it could not in general mate with rook (or even queen) and king against bare king. The technique of squeezing the enemy king into a progressively smaller box was unknown to it.

    • From the abstract:

      > We annotate each board in the dataset with action-values provided by the powerful Stockfish 16 engine, leading to roughly 15 billion data points.

      So some of the learning data comes from Stockfish.

      1 reply →

That must mean they found some similarity metric for high-level chess which is very impressive. In chess one pawn moving one square can be the difference between a won and a lost position. But knowing that usually requires lots of calculation.

Nice. I build AI-driven puzzle games where you can pick algorithms, tune parameters, or train ML models to play the game.

Chess is on my priority list for the next game. Now I know that ML strategy is on par with search algorithms.

  • > I know that ML strategy is on par with search algorithms

    It's on par with very strong humans, but not as good as a normal engine.

270M parameters at I guess FP/BF16?

I'd be interested to see how well the ELO holds up when the model is quantized.

At INT8 a small transformer like this could have a pretty amazing speed and efficiency on an Edge TPU or other very low power accelerator chip. The question becomes then is it faster / more efficient than Stockfish 16 on a similarly powered CPU. As we've seen with LLM's, they can be extremely speedy when quantized and all the stops pulled out on hardware to efficiently infer them compared to the raw FP16 and naive implementations.

imho fascinating experiment, even if it didn't produce world class Elo.

For one thing, statelessness deprives it of easy solutions to repetition and endgame decisiveness.

The path to AGI:

0. Have model A.

1. Use Monte Carlo with A to get supervised data.

2. Train model B with data from A.

3. Use Monte Carlo with B to get supervised data.

4. Train model C with data from B...

  • That's basically how OpenAI is working. They use generated training sets from one model to train the next model (plus other stuff with it).

    But the "other stuff" is pretty important. That is what pulls it away from just constantly re-amplifying the bias in the initial training data.

    • I still want to see some examples of a "mistake" in the training data getting detected or reduced.

      For example, somewhere in the training data the string "All cats are red" should get detected when lots of other data in the training set contradicts the statement.

      And obviously it doesn't have to be simple logical statements, but also bigger questions like "how come the 2nd world war happened despite X person and Y person being on good speaking terms as evidenced by all these letters in the archives?"

      When AI can do that, it should be able to turn our body of knowledge into a much bigger/more useful one by raising questions that arise from data we already have, but never noticed.

  • That is an awesome idea. I wish the authors would open source the code and weights so this can be tried.

    • They basically just described alphaZero, the difference being that alphaZero uses MCTS during inference too.

I immediately saw this and knew it was BS. It’s a search problem, the human brain even does a search. Model internally is scanning each position and determining the next probably position. That is a predictive search, you can’t just restructure the problem.

Now arguably it’s doing it differently, maybe? But still a search

i dont follow... even if its trained anc doesnt use search isnt the act of it deciding the next move a sortof search anyway based off its training? Ive heard people describe LLMs as extremely broad search, basically attempting to build world model and then predicting the next world based on that. Is this fundamentally different from search? Am i wrong in my assumptions here?

  • We know the model is approximating the results of a search. We don’t know whether it is actually searching.

    At the most basic level, the model is just giving probabilities for the next moves, or in the case of value approximation, guessing which bucket the value falls into.

Next up, claims of ...

  Text generation without human writers
  Image generation without human artists
  Video generation without production crews

The rub is that all the training data has been done by lots and lots of search, human writing, human artists, and human production crews. At every step, they curated the results and generated a beautiful latent space that the model was then trained on.

The impressive part about AlphaZero is that it did the Monte-Carlo Tree Search itself, without being trained on human decision-making.

While this ... well, this is just taking credit for all the search and work that was done by Stockfish, or humans, at a massive scale over centuries and then posted nearly for free online. It's then using all that data and generating similar stuff. Whoop de doo. Oh, and it's even used cheap labor for the last mile, too:

https://time.com/6247678/openai-chatgpt-kenya-workers/

It's not the same thing as actual search to, for example, automatically derive scientific laws (e.g. Kepler's laws of motion) from raw data fed to it (e.g. of star movements). AI doing that can actually model the real space, not the latent space. It can go out and learn without humans or stockfish massively bootstrapping its knowledge.

I mean, don't get me wrong ... learning a lot about the latent space is what students strive to do in schools and universities, and the AIs are like a very smart student. In fact, they can be huge polymaths and polyglots and therefore uncover a lot of interesting connections and logical deductions from the latent space. They can do so at a huge scale... and I have often said that swarms of AIs will be unstoppable. So at the end of the day, although this isn't very impressive when it comes to the credit of who did the search and curation, AI is going to be extremely impressive with what it can do with the results on the next N levels.

  • I would very much prefer a model trained on human data that had absorbed a part of human values, than a completely de novo intelligence that bootstrapped itself using experimentation in the physical world.

argmax and argmin are searching. Just because it isn't a search with a lot of depth that doesn't mean it isn't searching.

"We annotate each board in the dataset with action-values provided by the powerful Stockfish 16 engine, leading to roughly 15 billion data points [...] without any domain-specific tweaks or explicit search algorithms."

?

Now do Go. :)

  • This used to be a comforting thought whenever computers beat humans in chess, but I think that time has passed. The paper mentions AlphaZero [1], which has beaten AlphaGo, which beat Lee Sedol back in 2016 [2].

    [1] https://en.wikipedia.org/wiki/AlphaZero

    [2] https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol

    • My pithy comment probably wasn't enough to express what I meant. :)

      I know that computers have already beaten humans at Go. But what's interesting is that in both the chess and Go cases, a lot of real-time compute was necessary to win the games. Now we have a potential way to build the model ahead of time such that the computer during interactive play is much smaller.

      This means that we can be much more portable with the solution, and it also means that for online game companies, they can spend a lot less money on gameplay, especially if gameplay is most of their compute.