Comment by sigmoid10

12 days ago

Omg. I know this is mostly marketing speaking, but this is her reply when asked about AMD's reticence to software:

> Well, let me be clear, there’s no reticence at all. [...] I think we’ve always believed in the importance of the hardware-software linkage and really, the key thing about software is, we’re supposed to make it easy for customers to use all of the incredible capability that we’re putting in these chips, there is complete clarity on that.

I'm baffled how clueless these CEOs sometimes seem about their own product. Like, do you even realize that this the reason why Nvidia is mopping the floor with your stuff? Have you ever talked to a developer who had to work with your drivers and stack? If you don't start massively investing on that side, Nvidia will keep dominating despite their outrageous pricing. I really want AMD to succeed here, but with management like that I'm not surprised that they can't keep up. Props to the interviewer for not letting her off the hook on this one after she almost dodged it.

38 comments

sigmoid10

cherryteastain 12 days ago

What is she supposed to say? Perhaps "our products have bad software, don't buy them, go buy Nvidia instead"?

sigmoid10 12 days ago
She could admit that they fell behind on this one and really need to focus on closing the gap now. But instead she says it's all business as usual, which assures me that I won't give their hardware another shot for quite a while.
- dewey 12 days ago
  
  I wouldn't blame that on the CEO, that's just regular media training and it's their job is to keep the stock market happy.
  Companies really only admit to failure if there's no other option and the pressure is too high ("Antenna gate" and others come to mind).
  
  5 replies →
slowmotiony 12 days ago
She could say "We are in a compute gold rush and yet AMD's stock didn't gain anything in the last 6 months, so I hereby submit my resignation". That would work.
- bee_rider 12 days ago
  
  She managed to pull them out of the garbage bin after Bulldozer but I guess she hasn’t managed to hook them up to the AI bubble yet.

pyaamb 12 days ago

I'd love to know if any domain experts have a write up on what the the talent+ time+financial investment it would take for AMD to come up with with something that is a worthy rival to CUDA. Very curious to understand what the obstacles are.

K0SM0S 12 days ago

~5 years. Medium-sized team in-house + hordes (hundreds, thousands) of engineers in the field helping clients on-site, writing code for them directly upstreamed to drivers, core libs, etc. (iteratively optimized in-house, ship feature, rinse and repeat). Story of the PlayStation SDKs, of DX too, but above all CUDA (they really outdid this strategy), now for cuDNN and so much more.
It takes incompressible time because you have to explore the whole space, cover most bases; and it takes an industry several years (about one "gen" / hardware cycle) to do that meaningfully.It helps when your platform is disruptive and customers move fast.
Maybe 3 years at best if you start on a new ideal platform designed for it from scratch. And can throw ungodly amount of money fast at it (think 5K low-level engineers roaming your installed base).
Maybe 10+ yrs (or never) if you're alone, poor, and Radeon (j/k but to mean it's non-trivial).
m_mueller 12 days ago

I’d say it mainly needs persistence and good execution (library support). NVIDIA has co-developed CUDA with their hardware, and largely stayed compatible with it, since around 2009, and around 2012 it first started taking off in the HPC space. Years later this enabled first their boom in crypto and then an even bigger one in AI. I don’t think this amount of R&D would be out of reach of today’s AMD (as NVIDIA wasn’t any bigger back then), but the backing of it needs to come from the very top.
sigmoid10 12 days ago
First, they need to work with kernel devs to finally fix their drivers. Like, Nvidia used to be a "pain in the ass" here as well (that's a literal quote from Torvalds), so simply by contributing more than nothing, they could have taken the lead. But they definitely screwed this one up.
Second, they need to fix their userspace stack. ROCm being open source and all is great in principle, but simply dropping your source to the masses doesn't make it magically work. They need to stop letting it linger by either working with the open source community (huge time investment) or do it themselves (huge money investment).
- JonChesterfield 12 days ago
  
  The code is all on GitHub, the ISA docs are public, the driver is in upstream Linux with the work in progress on Gitlab. You can build whatever you want on AMD's hardware with total disregard to their software if you're so inclined. One or two companies seem to be doing so.
  This has been true since roughly the opencl days, where the community could have chosen open standards over subservience to team green. Then again for the HSA movement, a really solid heterogeneous programming model initially supported by a bunch of companies. Also broadly ignored.
  Today the runtime code is shipping in Linux distributions. Decent chance your laptop has an AMD CPU in it, that'll have a built in GPU that can run ROCm with the kernel you're already using and packages your distribution ships.
  I'm not sure what more AMD could be doing here. What more do you want them to do?
  
  3 replies →
bee_rider 12 days ago
I wonder if they really need a CUDA rival.
This AI stuff has progressed a bit. Intel has been working on interesting stuff with OneAPI. It might be the case that things have progressed to the point where the primitives are well enough understood that you need something more like a good library rather than a good compiler.
In the end, more people seem to love BLAS than Fortran, after all.
- wmf 12 days ago
  
  That library (Triton) sits on top of the compiler and drivers (ROCm). If the driver kernel panics, no high-level library can fix that.
  
  1 reply →
nextaccountic 11 days ago

I don't want a CUDA rival. I want to get the entire pile of CUDA code that is already written and run it on AMD GPUs without any kind of tweak or rewrite, and have it just work every time
Compatibility with existing code is very important. People can't afford to rewrite their stuff just to support AMD, and thus they don't
AMD is kind of trying to do this with rocm and HIP, but whatever they are doing it's not enough
Guthur 12 days ago
My theory is that someone came up with the bright idea of allowing more open source in the stack and that that would allow them to get it all done via crowd sourcing and on the cheap. But if true it was a quite naive view of how it might work.
If instead they said let's take the money we should invested in internal development and build an open developer community that will leverage our hardware to build a world class software stack it might have been a little better.
- bee_rider 12 days ago
  
  AMD has just never had good developer software. For ages the best BLAS on AMD was… Intel MKL, as long as you figured out how to dispatch the right kernels.
  Actually, it could be really cool if everybody acted like AMD. The fact that Intel and Nvidia put out the best number libraries for free means you can’t sell a number crunching library!
lhl 12 days ago

I spotted this recent post https://www.reddit.com/r/LocalLLaMA/comments/1deqahr/comment... that was pretty interesting:
> When I was working on TVM at Qualcomm to port it to Hexagon a few years ago we had 12 developers working on it and it was still a multiyear long and difficult process.
> This is also ignoring the other 20 or so developers we had working on Hexagon for LLVM, which did all of the actual hardware enablement; we just had to generate good LLVM IR. You have conveniently left out all of the LLVM support that this all requires as AMD also uses LLVM to support their GPU architectures.
> Funny enough, about a half dozen of my ex coworkers left Qualcomm to go do ML compilers at AMD and they're all really good at it; way better than I am, and they haven't magically fixed every issue
> It's more like "hire 100 additional developers to work on the ROCM stack for a few years"
This last statement sounds about right. Note that ROCm has over 250 repos on Github, a lot of them pretty active: https://github.com/orgs/ROCm/repositories?type=all - I'm sure an enterprising analyst who was really interested could look at the projects active over the past year and find unique committers. I'd guess it's in the hundreds already.
I think if you click through the ROCm docs https://rocm.docs.amd.com/en/latest/ (and maybe compare to the CUDA docs https://docs.nvidia.com/cuda/ ) you might get a good idea of the differences. ROCm has made huge strides over the past year, but to me, the biggest fundamental problem is still that CUDA basically runs OOTB on every GPU that Nvidia makes (with impressive backwards and in some cases even forwards compatibility to boot https://docs.nvidia.com/deploy/cuda-compatibility/ ) on both Linux and Windows, and... ROCm simply doesn't.
I think the AMD's NPUs complicate things a bit as well. It looks like it's its currently running on its own ONNX/Vitis (Xilinx) stack https://github.com/amd/RyzenAI-SW , and really it should either get folded into ROCm (or a new SYCL/oneAPI-ish layer needs to be adopted to cover everything).
preisschild 12 days ago

> is a worthy rival to CUDA
Vulkan Compute already exists
But when Developers still continue buying NVIDIA for CUDA, because developers only target CUDA for their applications it is a chicken-egg scenario, similar to Linux vs Windows.

Nullabillity 12 days ago

> Like, do you even realize that this the reason why Nvidia is mopping the floor with your stuff?

Are they?

Generally, my experience has been that AMD products generally just work even if they're a bit buggy sometimes, while Nvidia struggles to get a video signal at all.

Seems perfectly reasonable to focus on what matters, while Nvidia is distracted by the AI fad.

sigmoid10 12 days ago
I'm not talking about gaming, I'm talking about general purpose computing (although even for gaming your statement is pretty bold). Since she's CEO of a publicly traded company, it seems pretty weird that she would ignore the fields where the money is at, while Nvidia becomes the most valuable company in the world. So she's not just ignoring developers' wants but also her stockholders'.
- Nullabillity 12 days ago
  
  GPGPU is largely a non-presence outside of a few niche fields like video encoding (which, uh, seems to work fine enough for me?).
  
  5 replies →
rangestransform 12 days ago
even if the LLM thing is an "AI fad" there are many other things that ML is used for that matter (to the people spending real money on GPUs - think A6000, H100, not gamer cards)
- Nullabillity 12 days ago
  
  Such as…?
  
  1 reply →