Comment by joe_the_user

8 months ago

Sort of but it seems a bit of a cheat.

Neural networks are universal approximators. Choose a function and get enough data from it and you can approximate it very closely and maybe exactly. If initially creating function F required algorithm Y ("search" or whatever), you can do your approximation to F and then say "Look F without Y" and for all we know, the approximation might be doing things internally that are actually nearly identical to the initial F.

11 comments

joe_the_user

janalsncm 8 months ago

A sufficiently large nn can learn an arbitrary function, yes. But stockfish is also theoretically perfect given infinite computational resources.

What is interesting is performing well under reasonable computational constraints i.e. doing it faster/with fewer flops than stockfish.

joe_the_user 8 months ago
Is the model more efficient than Stockfish? I think Stockfish runs on regular CPU computer and I'd guess this " 270M parameter transformer model" requires a GPU but I can't find any reference to efficiency in the paper.
Also found in the paper: "While our largest model achieves very good performance, it does not completely close the gap to Stockfish 16". It's actually inferior but they still think it's an interesting exercise. But that's the thing, it's primarily an exercise like calculating pi to a billion decimal points or overclocking a gaming laptop.
- janalsncm 8 months ago
  
  Well I think it’s interesting to the extent that it optimizes the solution for a different piece of hardware, the TPU. Their results are also applicable to GPUs. Since the problem is highly parallelizable, we might expect a viable model to quickly approximate a more accurate evaluation, and perhaps even make up for it in volume.
- rolisz 8 months ago
  
  BERT has around that many parameters and it runs on CPU in 200ms
  
  1 reply →
- samus 8 months ago
  
  270M are very tractable even on CPUs.

seb314 8 months ago

The state space of chess is so huge that even a giant training set would be a _very_ sparse sample of the stockfish-computed value function.

So the network still needs to do some impressive generalization in order to „interpolate“ between those samples.

I think so, anyway (didn‘t read the paper but worked on alphazero-like algorithms for a few years)

valdiorn 8 months ago

Only on the same way that DNA is a cheat for constructing organisms. After all they are universal recipes for organisms. Study enough DNA sequences and splice together a sequence and you can theoretically design any mythical creature you can think of.

But if someone grew an actual dragon in a lab by splicing together DNA fragments, that would still be a major feat. Similarly, training a neural net to play grandmaster level chess is simple in theory but extremely difficult in practice.

dawnofdusk 8 months ago

It doesn't get the actual optimal Q values computed from Stockfish (presumably this takes infinite compute to calculate), in fact it gets computed estimates from polling Stockfish for only 50ms.

So you're estimating from data a function which is itself not necessarily optimal. Moreover, the point is more like how far can we get using a really generic transformer architecture that is not tuned to domain-specific details of our problem, which Stockfish is.

jackblemming 8 months ago

No, arbitrarily wide neural networks are approximators of Borel Measurable functions. Big difference between that and “any function”. RNNs are Turing Complete though.

eldenring 8 months ago

You can say the same thing about RNNs. Technically nothing is turing complete without infinite scratch space.