← Back to context

Comment by ang_cire

15 days ago

Well, probably by being much more selective about what we put in than just training on the most cheap and large corpus that is the internet.

This is not a technical limitation at all, this is purely about cost and time, and companies wanting to save on both.

There are also methods like RAG that try to give them access to fixed datasets rather than just the algorithmic representations of their training data.