← Back to context

Comment by mk_stjames

8 months ago

270M parameters at I guess FP/BF16?

I'd be interested to see how well the ELO holds up when the model is quantized.

At INT8 a small transformer like this could have a pretty amazing speed and efficiency on an Edge TPU or other very low power accelerator chip. The question becomes then is it faster / more efficient than Stockfish 16 on a similarly powered CPU. As we've seen with LLM's, they can be extremely speedy when quantized and all the stops pulled out on hardware to efficiently infer them compared to the raw FP16 and naive implementations.