← Back to context

Comment by mtlmtlmtlmtl

8 months ago

From the paper:

If Stockfish detects a mate-in-k (e.g., 3 or 5) it outputs k and not a centipawn score. We map all such outputs to the maximal value bin (i.e., a win percentage of 100%). Similarly, in a very strong position, several actions may end up in the maximum value bin. Thus, across time-steps this can lead to our agent playing somewhat randomly, rather than committing to one plan that finishes the game quickly (the agent has no knowledge of its past moves). This creates the paradoxical situation that our bot, despite being in a position of overwhelming win percentage, fails to take the (virtually) guaranteed win and might draw or even end up losing since small chances of a mistake accumulate with longer games (see Figure 4). To prevent some of these situations, we check whether the predicted scores for all top five moves lie above a win percentage of 99% and double-check this condition with Stockfish, and if so, use Stockfish’s top move (out of these) to have consistency in strategy across time-steps.

So they freely admit that their thing will draw or even lose in these positions. It's not merely making the win a little cleaner.

> So they freely admit that their thing will draw or even lose in these positions.

Yeah, they didn't use Stockfish for the lols.

They create a search-less engine for chess. And then used a search engine to pay a small minority of the game.

  • Yes. So how is this irrelevant for qualifying as GM-level play then? Being able to play these positions is a clear prerequisite for even being in the ballpark of GM strength. If you regularly choke in completely winning endgames, you'll never get there.

    This is cheating, plain and simple. It would never fly in human play or competitive computer play. And it's most definitely disingenuous research. They made an engine, it plays a certain level, and then they augment it with preexisting software they didn't even write themselves to beef up their claims about it.

    • > If you regularly choke in completely winning endgames, you'll never get there.

      Except we're talking about moves where no human player would choke because they are basically impossible to lose except by playing at random (which is what the bot does).

      It makes no sense to try and compare to a human player in the same situation because no human player could at the same time end up in such a position against a strong opponent and be unable to exploit them once there…

      It's basically a bug, and what they did is just working around this particular bug in order to have a releasable paper.