← Back to context

Comment by stavros

18 hours ago

This makes no sense. A thing's roots don't change, either it did start there or it didn't.

It didn't.

At least, the Transformer didn't. The abstract idea of a language model goes way back though within the field of linguistics, and people were building simplistic "N-gram" models before ever using neural nets, then using other types of neural net such as LSTMs and CNNs(!) before Google invented the Transformer (primarily with the goal of fully utilizing the parallelism available from GPUs - which couldn't be done with a recurrent model like LSTM).