Comment by joaogui1
12 days ago
Depends on a ton of stuff really, like size of the model, how long do you want to train it for, what exactly do you mean by "like Hacker News or Wikipedia". Both Wikipedia and Hacker News are pretty small by current LLM training sets standards, so if you train only on for example a combination of these 2 you would likely end up with a model that lacks most capabilities we associate with large language models nowadays
No comments yet
Contribute on Hacker News ↗