← Back to context

Comment by janalsncm

8 months ago

In one sense, I can understand why they would choose to use Stockfish in mate-in-N positions. The fact that the model can't distinguish between mate in 5 and mate in 3 is an implementation detail. Since the vast majority of positions are not known to be wins or draws, it's still an interesting finding.

However, in reality all positions are actually wins (for black or white) or draws. One reason they gave for why stockfish is needed to finish the game is because their evaluation function is imperfect, which is also an notable result.

Is this in comparison to some other evaluation function which is perfect? I agree that all positions should have a certainty of win, draw, or lose with perfect play, but no engine is close to that level of evaluation function.

I do suspect that this pathological behavior could be trained out with additional fine tuning, but likely not without slightly diminishing the model's overall ability.

  • It comes down to: What is the evaluation for? For a human using an engine to analyze, it is about getting to more win-likely positions. And for an engine, it really is the same; plus to guide the search. Having a perfect trinary win/draw/loss would certainly be the _truth_ about a position in some objective way, but it would almost certainly not be the optimal way to win chess games against a set opponent. 1. e4 and 1. h3 are almost certainly both draws with perfect play from both sides, but the former is much more likely to net a win, especially for a human using the engine.