← Back to context

Comment by sigmoid10

16 days ago

First, they need to work with kernel devs to finally fix their drivers. Like, Nvidia used to be a "pain in the ass" here as well (that's a literal quote from Torvalds), so simply by contributing more than nothing, they could have taken the lead. But they definitely screwed this one up.

Second, they need to fix their userspace stack. ROCm being open source and all is great in principle, but simply dropping your source to the masses doesn't make it magically work. They need to stop letting it linger by either working with the open source community (huge time investment) or do it themselves (huge money investment).

The code is all on GitHub, the ISA docs are public, the driver is in upstream Linux with the work in progress on Gitlab. You can build whatever you want on AMD's hardware with total disregard to their software if you're so inclined. One or two companies seem to be doing so.

This has been true since roughly the opencl days, where the community could have chosen open standards over subservience to team green. Then again for the HSA movement, a really solid heterogeneous programming model initially supported by a bunch of companies. Also broadly ignored.

Today the runtime code is shipping in Linux distributions. Decent chance your laptop has an AMD CPU in it, that'll have a built in GPU that can run ROCm with the kernel you're already using and packages your distribution ships.

I'm not sure what more AMD could be doing here. What more do you want them to do?

  • > the community could have chosen open standards over subservience to team green

    i think most people would rather have proprietary software that works rather than opensource that doesn't

  • >The code is all on GitHub, the ISA docs are public, the driver is in upstream Linux with the work in progress on Gitlab

    That's exactly what I meant by dumping the source and hoping that someone turns else it to plug and play magic - for free. This simply doesn't work.

    • The code is there and they're stoically implementing everything themselves.

      The current ML ecosystem is people write papers and frameworks using cuda and then people complain that amd hasn't implemented them all on rocm, without really acknowledging that nvidia didn't implement them either. All the code is out there so that people could implement their work on amd and then complain at nvidia for it missing from their ecosystem, but that's not the done thing.

      What would you have amd do differently here?