← Back to context

Comment by sigmoid10

12 days ago

Omg. I know this is mostly marketing speaking, but this is her reply when asked about AMD's reticence to software:

> Well, let me be clear, there’s no reticence at all. [...] I think we’ve always believed in the importance of the hardware-software linkage and really, the key thing about software is, we’re supposed to make it easy for customers to use all of the incredible capability that we’re putting in these chips, there is complete clarity on that.

I'm baffled how clueless these CEOs sometimes seem about their own product. Like, do you even realize that this the reason why Nvidia is mopping the floor with your stuff? Have you ever talked to a developer who had to work with your drivers and stack? If you don't start massively investing on that side, Nvidia will keep dominating despite their outrageous pricing. I really want AMD to succeed here, but with management like that I'm not surprised that they can't keep up. Props to the interviewer for not letting her off the hook on this one after she almost dodged it.

What is she supposed to say? Perhaps "our products have bad software, don't buy them, go buy Nvidia instead"?

  • She could admit that they fell behind on this one and really need to focus on closing the gap now. But instead she says it's all business as usual, which assures me that I won't give their hardware another shot for quite a while.

    • I wouldn't blame that on the CEO, that's just regular media training and it's their job is to keep the stock market happy.

      Companies really only admit to failure if there's no other option and the pressure is too high ("Antenna gate" and others come to mind).

      5 replies →

  • She could say "We are in a compute gold rush and yet AMD's stock didn't gain anything in the last 6 months, so I hereby submit my resignation". That would work.

    • She managed to pull them out of the garbage bin after Bulldozer but I guess she hasn’t managed to hook them up to the AI bubble yet.

I'd love to know if any domain experts have a write up on what the the talent+ time+financial investment it would take for AMD to come up with with something that is a worthy rival to CUDA. Very curious to understand what the obstacles are.

  • ~5 years. Medium-sized team in-house + hordes (hundreds, thousands) of engineers in the field helping clients on-site, writing code for them directly upstreamed to drivers, core libs, etc. (iteratively optimized in-house, ship feature, rinse and repeat). Story of the PlayStation SDKs, of DX too, but above all CUDA (they really outdid this strategy), now for cuDNN and so much more.

    It takes incompressible time because you have to explore the whole space, cover most bases; and it takes an industry several years (about one "gen" / hardware cycle) to do that meaningfully.It helps when your platform is disruptive and customers move fast.

    Maybe 3 years at best if you start on a new ideal platform designed for it from scratch. And can throw ungodly amount of money fast at it (think 5K low-level engineers roaming your installed base).

    Maybe 10+ yrs (or never) if you're alone, poor, and Radeon (j/k but to mean it's non-trivial).

  • I’d say it mainly needs persistence and good execution (library support). NVIDIA has co-developed CUDA with their hardware, and largely stayed compatible with it, since around 2009, and around 2012 it first started taking off in the HPC space. Years later this enabled first their boom in crypto and then an even bigger one in AI. I don’t think this amount of R&D would be out of reach of today’s AMD (as NVIDIA wasn’t any bigger back then), but the backing of it needs to come from the very top.

  • First, they need to work with kernel devs to finally fix their drivers. Like, Nvidia used to be a "pain in the ass" here as well (that's a literal quote from Torvalds), so simply by contributing more than nothing, they could have taken the lead. But they definitely screwed this one up.

    Second, they need to fix their userspace stack. ROCm being open source and all is great in principle, but simply dropping your source to the masses doesn't make it magically work. They need to stop letting it linger by either working with the open source community (huge time investment) or do it themselves (huge money investment).

    • The code is all on GitHub, the ISA docs are public, the driver is in upstream Linux with the work in progress on Gitlab. You can build whatever you want on AMD's hardware with total disregard to their software if you're so inclined. One or two companies seem to be doing so.

      This has been true since roughly the opencl days, where the community could have chosen open standards over subservience to team green. Then again for the HSA movement, a really solid heterogeneous programming model initially supported by a bunch of companies. Also broadly ignored.

      Today the runtime code is shipping in Linux distributions. Decent chance your laptop has an AMD CPU in it, that'll have a built in GPU that can run ROCm with the kernel you're already using and packages your distribution ships.

      I'm not sure what more AMD could be doing here. What more do you want them to do?

      3 replies →

  • I wonder if they really need a CUDA rival.

    This AI stuff has progressed a bit. Intel has been working on interesting stuff with OneAPI. It might be the case that things have progressed to the point where the primitives are well enough understood that you need something more like a good library rather than a good compiler.

    In the end, more people seem to love BLAS than Fortran, after all.

    • That library (Triton) sits on top of the compiler and drivers (ROCm). If the driver kernel panics, no high-level library can fix that.

      1 reply →

  • I don't want a CUDA rival. I want to get the entire pile of CUDA code that is already written and run it on AMD GPUs without any kind of tweak or rewrite, and have it just work every time

    Compatibility with existing code is very important. People can't afford to rewrite their stuff just to support AMD, and thus they don't

    AMD is kind of trying to do this with rocm and HIP, but whatever they are doing it's not enough

  • My theory is that someone came up with the bright idea of allowing more open source in the stack and that that would allow them to get it all done via crowd sourcing and on the cheap. But if true it was a quite naive view of how it might work.

    If instead they said let's take the money we should invested in internal development and build an open developer community that will leverage our hardware to build a world class software stack it might have been a little better.

    • AMD has just never had good developer software. For ages the best BLAS on AMD was… Intel MKL, as long as you figured out how to dispatch the right kernels.

      Actually, it could be really cool if everybody acted like AMD. The fact that Intel and Nvidia put out the best number libraries for free means you can’t sell a number crunching library!

  • I spotted this recent post https://www.reddit.com/r/LocalLLaMA/comments/1deqahr/comment... that was pretty interesting:

    > When I was working on TVM at Qualcomm to port it to Hexagon a few years ago we had 12 developers working on it and it was still a multiyear long and difficult process.

    > This is also ignoring the other 20 or so developers we had working on Hexagon for LLVM, which did all of the actual hardware enablement; we just had to generate good LLVM IR. You have conveniently left out all of the LLVM support that this all requires as AMD also uses LLVM to support their GPU architectures.

    > Funny enough, about a half dozen of my ex coworkers left Qualcomm to go do ML compilers at AMD and they're all really good at it; way better than I am, and they haven't magically fixed every issue

    > It's more like "hire 100 additional developers to work on the ROCM stack for a few years"

    This last statement sounds about right. Note that ROCm has over 250 repos on Github, a lot of them pretty active: https://github.com/orgs/ROCm/repositories?type=all - I'm sure an enterprising analyst who was really interested could look at the projects active over the past year and find unique committers. I'd guess it's in the hundreds already.

    I think if you click through the ROCm docs https://rocm.docs.amd.com/en/latest/ (and maybe compare to the CUDA docs https://docs.nvidia.com/cuda/ ) you might get a good idea of the differences. ROCm has made huge strides over the past year, but to me, the biggest fundamental problem is still that CUDA basically runs OOTB on every GPU that Nvidia makes (with impressive backwards and in some cases even forwards compatibility to boot https://docs.nvidia.com/deploy/cuda-compatibility/ ) on both Linux and Windows, and... ROCm simply doesn't.

    I think the AMD's NPUs complicate things a bit as well. It looks like it's its currently running on its own ONNX/Vitis (Xilinx) stack https://github.com/amd/RyzenAI-SW , and really it should either get folded into ROCm (or a new SYCL/oneAPI-ish layer needs to be adopted to cover everything).

  • > is a worthy rival to CUDA

    Vulkan Compute already exists

    But when Developers still continue buying NVIDIA for CUDA, because developers only target CUDA for their applications it is a chicken-egg scenario, similar to Linux vs Windows.

> Like, do you even realize that this the reason why Nvidia is mopping the floor with your stuff?

Are they?

Generally, my experience has been that AMD products generally just work even if they're a bit buggy sometimes, while Nvidia struggles to get a video signal at all.

Seems perfectly reasonable to focus on what matters, while Nvidia is distracted by the AI fad.

  • I'm not talking about gaming, I'm talking about general purpose computing (although even for gaming your statement is pretty bold). Since she's CEO of a publicly traded company, it seems pretty weird that she would ignore the fields where the money is at, while Nvidia becomes the most valuable company in the world. So she's not just ignoring developers' wants but also her stockholders'.