← Back to context

Comment by joe_the_user

8 months ago

Sort of but it seems a bit of a cheat.

Neural networks are universal approximators. Choose a function and get enough data from it and you can approximate it very closely and maybe exactly. If initially creating function F required algorithm Y ("search" or whatever), you can do your approximation to F and then say "Look F without Y" and for all we know, the approximation might be doing things internally that are actually nearly identical to the initial F.

A sufficiently large nn can learn an arbitrary function, yes. But stockfish is also theoretically perfect given infinite computational resources.

What is interesting is performing well under reasonable computational constraints i.e. doing it faster/with fewer flops than stockfish.

  • Is the model more efficient than Stockfish? I think Stockfish runs on regular CPU computer and I'd guess this " 270M parameter transformer model" requires a GPU but I can't find any reference to efficiency in the paper.

    Also found in the paper: "While our largest model achieves very good performance, it does not completely close the gap to Stockfish 16". It's actually inferior but they still think it's an interesting exercise. But that's the thing, it's primarily an exercise like calculating pi to a billion decimal points or overclocking a gaming laptop.

    • Well I think it’s interesting to the extent that it optimizes the solution for a different piece of hardware, the TPU. Their results are also applicable to GPUs. Since the problem is highly parallelizable, we might expect a viable model to quickly approximate a more accurate evaluation, and perhaps even make up for it in volume.

The state space of chess is so huge that even a giant training set would be a _very_ sparse sample of the stockfish-computed value function.

So the network still needs to do some impressive generalization in order to „interpolate“ between those samples.

I think so, anyway (didn‘t read the paper but worked on alphazero-like algorithms for a few years)

Only on the same way that DNA is a cheat for constructing organisms. After all they are universal recipes for organisms. Study enough DNA sequences and splice together a sequence and you can theoretically design any mythical creature you can think of.

But if someone grew an actual dragon in a lab by splicing together DNA fragments, that would still be a major feat. Similarly, training a neural net to play grandmaster level chess is simple in theory but extremely difficult in practice.

It doesn't get the actual optimal Q values computed from Stockfish (presumably this takes infinite compute to calculate), in fact it gets computed estimates from polling Stockfish for only 50ms.

So you're estimating from data a function which is itself not necessarily optimal. Moreover, the point is more like how far can we get using a really generic transformer architecture that is not tuned to domain-specific details of our problem, which Stockfish is.

No, arbitrarily wide neural networks are approximators of Borel Measurable functions. Big difference between that and “any function”. RNNs are Turing Complete though.

  • You can say the same thing about RNNs. Technically nothing is turing complete without infinite scratch space.