← Back to context

Comment by imranhou

5 hours ago

From the page: "We also show that our model outperforms AlphaZero's policy and value networks (without MCTS) and GPT-3.5-turbo-instruct."

Why compare this to GPT-3.5-turbo-instruct? Is that near SOTA in this space?

As far as anyone knows, 3.5-turbo-instruct is the best chess playing (certainly it was at the time of the paper) LLM. About 1800 Elo and < 0.1% Illegal move rate. It's unclear why it was so much better than 4 (lack of RLHF?, Data?) and I don't know if anyone has bothered to test 4o similarly but it was pretty big news online at the time.