Comment by trsohmers

15 hours ago

Do you think that the 16k GPUs get used once and then are thrown away? Llama 405B was trained over 56 days on the 16k GPUs; if I round that up to 60 days and assume the current mainstream hourly rate of $2/H100/hour from the Neoclouds (which are obviously making margin), that comes out to a total cost of ~$47M. Obviously Meta is training a lot of models using their GPU equipment, and would expect it to be in service for at least 3 years, and their cost is obviously less than what the public pricing on clouds is.

And Meta is using a lot of GPUs for offline ML and online ML features on Instagram, FB etc. So nothing is "wasted".