← Back to context

Comment by joaogui1

12 days ago

Depends on a ton of stuff really, like size of the model, how long do you want to train it for, what exactly do you mean by "like Hacker News or Wikipedia". Both Wikipedia and Hacker News are pretty small by current LLM training sets standards, so if you train only on for example a combination of these 2 you would likely end up with a model that lacks most capabilities we associate with large language models nowadays