Comment by paulddraper

8 months ago

But only to complete a winning position.

That's a crucial part of chess that can't simply be swept under the rug. If I had won all the winning positions I've had over the years I'd be hundreds of points higher rated.

What if a human only used Stockfish in winning positions? Is it cheating? Obviously it is.

  • > That's a crucial part of chess that can't simply be swept under the rug.

    Grandmasters very literally do it all the time.

    > What if a human only used Stockfish in winning positions? Is it cheating? Obviously it is.

    Yes, but this isn't that.

    This is a computer that is playing chess. And FYI (usually) without search.

The process of converting a completely winning position (typically one with a large material advantage) is a phase change relative to normal play which is the struggle to achieve such a position. In other words you are doing something different at that point. For example, me as weak FIDE CM (Candidate Master) could not compete with a top grandmaster in a game of chess, but I could finish off a trivial win.

Edit: Recently I brought some ancient (1978) chess software back to life https://github.com/billforsternz/retro-sargon. These two phases of chess, basically two different games, were quite noticeable with that program, which is chess software stripped back to the bone. Sargon 1978 could play decently well, but it absolutely did not have the technique to convert winning positions (because this is different challenge to regular chess). For example, it could not in general mate with rook (or even queen) and king against bare king. The technique of squeezing the enemy king into a progressively smaller box was unknown to it.

That 'only' usage in the winning position could be a decisive for gaining GM rating.

  • Positions with 99% win percentage are not decisive for GM vs non-GM rating.

    • From the paper:

      If Stockfish detects a mate-in-k (e.g., 3 or 5) it outputs k and not a centipawn score. We map all such outputs to the maximal value bin (i.e., a win percentage of 100%). Similarly, in a very strong position, several actions may end up in the maximum value bin. Thus, across time-steps this can lead to our agent playing somewhat randomly, rather than committing to one plan that finishes the game quickly (the agent has no knowledge of its past moves). This creates the paradoxical situation that our bot, despite being in a position of overwhelming win percentage, fails to take the (virtually) guaranteed win and might draw or even end up losing since small chances of a mistake accumulate with longer games (see Figure 4). To prevent some of these situations, we check whether the predicted scores for all top five moves lie above a win percentage of 99% and double-check this condition with Stockfish, and if so, use Stockfish’s top move (out of these) to have consistency in strategy across time-steps.

      So they freely admit that their thing will draw or even lose in these positions. It's not merely making the win a little cleaner.

      3 replies →

    • Proof?

      For winning any game at some point (at the end of the game) there will be a position with >99% winning chances. The move that follows are decisive.

      1 reply →

From the abstract:

> We annotate each board in the dataset with action-values provided by the powerful Stockfish 16 engine, leading to roughly 15 billion data points.

So some of the learning data comes from Stockfish.

  • The original comment was "for playing."

    In training, traditional search is absolutely used to score positions.

    In playing, search is not used. (*Except to finish out an already-won position.)