← Back to context

Comment by paulmd

14 days ago

> For turn key proprietary stuff where you really like the happy path foreseen by your vendor

there really was no way for AMD to foresee that people might want to run GPGPU workloads on their polaris cards? isn't that a little counterfactual to the whole OpenCL and HSA Framework push predating that?

Example: it's not that things like Bolt didn't exist to try and compete with Thrust... it's that the NVIDIA one has had three updates in the last month and Bolt was last updated 10 years ago.

You're literally reframing "having working runtime and framework support for your hardware" as being some proprietary turnkey luxury for users, as well as an unforeseeable eventuality for AMD. It wasn't a development priority, but users do like to actually build code that works etc.

That's why you got kicked to the curb by Blender - your OpenCL wasn't stable even after years of work from them and you. That's why you got kicked to the curb by Octane - your Vulkan Compute support wasn't stable enough to even compile their code successfully. That's the story that's related by richg42 about your OpenGL driver implementation too - that it's just paper features and resume-driven development by developers 10 years departed all the way down.

The issues discussed by geohotz aren't new, and they aren't limited to ROCm or deep learning in general. This is, broadly speaking, the same level of quality that AMD has applied to all its software for decades. And the social-media "red team" loyalism strategy doesn't really work here, you can't push this into "AMD drivers have been good for like 10 years now!!!" fervor when the understanding of the problems are that broad and that collectively shared and understood. Every GPGPU developer who's tried has bounced off this AMD experience for literally an entire generation running now. The shared collective experience is that AMD is not serious in the field, and it's difficult to believe it's a good-faith change and interest in advancing the field rather than just a cashgrab.

It's also completely foreseeable that users want broad, official support for all their architectures, and not one or two specifics etc. Like these aren't mysteries that AMD just accidentally forgot about, etc. They're basic asks that you are framing as "turnkey proprietary stuff", like a working opencl runtime or a working spir-v compiler.

What was it linus said about the experience of working with NVIDIA? That's been the experience of the GPGPU community working with AMD, for decades. Shit is broken and doesn't work, and there's no interest in making it otherwise. And the only thing that changed it is a cashgrab, and a working compiler/runtime is still "turnkey proprietary stuff" they have to be arm-twisted into doing by Literally Being Put On Blast By Geohotz Until It's Fixed. "Fuck you, AMD" is a sentiment that there is very valid reasons to feel given the amount of needless suffering you have generated - but we just don't do that to red team, do we?

But you guys have been more intransigent about just supporting GPGPU, no matter what framework, please just pick one, get serious and start working already than NVIDIA ever was about wayland etc. You've blown decades just refusing to ever shit or get off the pot (without even giving enough documentation for the community to just do it themselves). And that's not an exaggeration - I bounced off the AMD stack in 2012, and it wasn't a new problem then either. It's too late for "we didn't know people wanted a working runtime or to develop on gaming cards" to work as an excuse, after decades of overt willing neglect it's just patronizing.

Again, sorry, this is ranty, it's not that I'm upset at you personally etc, but like, my advice as a corporate posture here is don't go looking for a ticker-tape parade for finally delivering a working runtime that you've literally been advertising support for for more than a decade like it's some favor to the community. These aren't "proprietary turnkey features" they're literally the basics of the specs you're advertising compliance with, and it's not even just one it's like 4+ different APIs that have this problem with you guys that has been widely known, discussed in tech blogs etc for more than a decade (richg42). I've been saying it for a long time, so has everyone else who's ever interacted with AMD hardware in the GPGPU space. Nobody there cared until it was a cashgrab, actually half the time you get the AMD fan there to tell you the drivers have been good for a decade now (AMD cannot fail, only be failed). It's frustrating. You've poisoned the well with generations of developers, with decades of corporate obstinance that would make NVIDIA blush, please at least have a little contrition about the whole experience and the feelings on the other side here.

You're saying interesting things here. It's not my perspective but I can see how you'd arrive at it. Worth noting that I'm an engineer writing from personal experience, the corporate posture might be quite divergent from this.

I think Cuda's GPU offloading model is very boring. An x64 thread occasionally pushes a large blob of work into a stream and sometime later finds out if it worked. That does however work robustly, provided you don't do anything strange from within the kernel. In particular allocating memory on the host from within the kernel deadlocks the kernel unless you do awkward things with shuffling streams. More ambitious things like spawning a kernel from a kernel just aren't available - there's only a hobbled nested lifetime thing available. The volta threading model is not boring but it is terrible, see https://stackoverflow.com/questions/64775620/cuda-sync-funct...

HSA puts the x64 cores and the gpu cores on close to equal footing. Spawning a kernel from a kernel is totally fine and looks very like spawning one from the host. Everything is correctly thread safe so calling mmap from within a kernel doesn't deadlock things. You can program the machine as a large cluster of independent cores passing messages to one another. For the raw plumbing, I wrote https://github.com/jonchesterfield/hostrpc. That can do things like have an nvidia card call a function on an amd one. That's the GPU programming model I care about - not passing blobs of floating point math onto some accelerator card, I want distributed graph algorithms where the same C++ runs on different architectures transparently to the application. HSA lends itself to that better than Cuda does. But it is rather bring your own code.

That is, I think the more general architecture amdgpu is shipping is better than the specialised one cuda implements, despite the developer experience being rather gnarlier. I can't express the things I want to on nvptx at all so it doesn't matter much that simpler things would work more reliably.

Maybe more relevant to your experience, I can offer some insight into the state of play at AMD recently and some educated guesses at the earlier state. ATI didn't do compute as far as I know. Cuda was announced in 2006, same year AMD acquired ATI. Intel Core 2 was also 2006 and I remember that one as the event that stopped everyone buying AMD processors. Must have been an interesting year to be in semiconductors, was before my time. So in the year cuda appears, ATI is struggling enough to be acquired, AMD mortgages itself to the limit to make the acquisition and Intel obsoletes AMD's main product.

I would guess that ~2007 marked the beginning of the really bad times for AMD. Even if they could guess what cuda would become they were in no position to do anything about it. There is scar tissue still evident from that experience. In particular, the games console being the breadwinner for years can be seen in some of the hardware decisions, and I've had an argument with someone whose stance was that semi-custom doesn't need a feature so we shouldn't do it.

What turned the corner is the DoE labs being badly burned by reliance on a single vendor for HPC. AMD proposed a machine which looks suspiciously like a lot of games consoles with the power budget turned way up and won the Frontier bid with it. That then came with a bunch of money to try to write some software to run on it which in a literal sense created the job opening I filled five years back. Intel also proposed a machine which they've done a hilariously poor job of shipping. So now AMD has built a software stack which was razor focused on getting the DoE labs to sign the cheques for functionally adequate on the HPC machines. That's probably the root of things like the approved hardware list for ROCm containing the cards sold to supercomputers and not so much the other ones.

It turns out there's a huge market opportunity for generative AI. That's not totally what the architecture was meant to do but whatever, it likes memory bandwidth and the amdgpu arch does do memory bandwidth properly. The rough play for that seems to be to hire a bunch of engineers and buy a bunch of compiler consultancies and hope working software emerges from that process, which in fairness does seem to be happening. The ROCm stack is irritating today but it's a whole different level of QoI relative to before the Frontier bring up.

Note that there's no apology nor contrition here. AMD was in a fight to survive for ages and rightly believed that R&D on GPU compute was a luxury expense. When a budget to make it work on HPC appeared it was spent on said HPC for reasonable fear that they wouldn't make the stage gates otherwise. I think they've done the right thing from a top level commercial perspective for a long time - the ATI and Xilinx merges in particular look great.

Most of my colleagues think the ROCm stack works well. They use the approved Ubuntu kernel and a set of ROCm libraries that passed release testing to iterate on their part of the stack. I suspect most people who treat the kernel version and driver installation directions as important have a good experience. I'm closer to the HN stereotype in that I stubbornly ignore the binary ROCm release and work with llvm upstream and the linux driver in whatever state they happen to be in, using gaming cards which usually aren't on the supported list. I don't usually have a good time but it has definitely got better over the years.

I'm happy with my bet on AMD over Nvidia, despite the current stock price behaviour making it a serious financial misstep. I believe Lisa knows what she's doing and that the software stack is moving in the right direction at a sufficient pace.

  • > I think Cuda's GPU offloading model is very boring.

    > That is, I think the more general architecture amdgpu is shipping is better than the specialised one cuda implements, despite the developer experience being rather gnarlier.

    This reminded me of that Twitter thread that was linked on HN yesterday, specifically the part about AMD's "true" dual core compared to Intel's "fake" dual core.

    > We did launch a “true” dual core, but nobody cared. By then Intel’s “fake” dual core already had AR/PR love. We then started working on a “true” quad core, but AGAIN, Intel just slapped 2 dual cores together & called it a quad-core. How did we miss that playbook?! AMD always launched w/ better CPUs but always late to mkt. Customers didn’t grok what is fake vs real dual/quad core. If you do cat /proc/cpu and see cpu{0-3} you were happy.

    https://news.ycombinator.com/item?id=40696384

    What is the currently available best way to write GPGPU code to be able to ship a single install.exe to end users that contains compiled code that runs on their consumer class AMD, Nvidia, and Intel graphics cards? Would AdaptiveCpp work?

    • Shipping compiled code works fine if you have a finite set of kernels. Just build everything for every target, gzip the result and send it out. People are a bit reluctant to do that because there are lots of copies of essentially the same information in the result.

      I suspect every solution you'll find which involves sending a single copy of the code will have a patched copy of llvm embedded in said install.exe, which ideally compiles the kernels to whatever is around locally at install time, but otherwise does so at application run time. It's not loads of fun deriving a program from llvm but it has been done a lot of times now.

      3 replies →

  • I mean, you say there was “just no money” but AMD signed a deal over three years ago to acquire Xilinx for $50b. They’ve been on an acquisition spree in fact. Just not anything related to gpgpu, because that wasn’t a priority.

    Yes, after you spend all your money there’s nothing left. Just like after refusing the merger with nvidia and then spending all the cash buying ati there was nothing left. Times were very tough, ATI and consoles kept the company afloat after spending all their money overpaying for ATI put you there in the first place. Should have done the merger with nvidia and not depleted your cash imo.

    More recently could easily have spent 0.5% of the money you spent on Xilinx and 10x’d your spend on GPGPU development for 10 years instead. That was 2020-2021 - it’s literally been 5+ years since things were good enough to spend $50 billion on a single acquisition.

    You also spent $4b on stock buybacks in 2021... and $8 billion in 2022... and geohotz pointed out your runtime still crashed on the sample programs on officially-supported hardware/software in 2023, right?

    Like the assertion that a single dime spent in any other fashion than the way it happened would have inevitably led to AMD going under while you spend an average of tens of billions of dollars a year on corporate acquisitions is silly. Maybe you legitimately believe that (and I have no doubt times were very very bad) but I suggest that you’re not seeing the forest for the trees there. Software has never been a priority and it suffered from the same deprioritizing as the dGPU division and Radeon generally (in the financial catastrophe in the wake of the ATI debacle). Raja said it all - gpus were going away, why spend money on any of it? You need some low end stuff for apus, they pulled the plug on everything else. And that was a rational, albeit shortsighted, decision to keep the company afloat. But that doesn’t mean it’s the only course which could have done that, that’s a fallacy/false logic.

    https://www.youtube.com/watch?v=I7aGC6Sp8zQ

    I also frankly think there is a real concerning problem with AMD and locus-of-control, it's a very clear PTSD symptom both for the company and the fans. Some spat with Intel 20 years ago didn't make AMD spend nearly a hundred billion dollars on acquisitions and stock buybacks instead of $100m on software. Everything constantly has to tie back to someone else rather than decisions that are being made inside the company - you guys are so battered and broken that (a) you can't see that you're masters of your own destiny now, and (b) that times are different now and you both have money to spend now and need to spend it. You are the corporate equivalent of a grandma eating rotten food despite having an adequate savings/income, because that's how things were during the formative years for you. You have money now, stop eating rotten food, and stop insisting that eating rotten food is the only way to survive. Maybe 20 years ago, but not today.

    I mean, it's literally been over 20 years now. At what point is it fair to expect AMD leadership to stand by their own decisions in their own right? Will we see decisions made in 2029 be justified with "but 25 years ago..."? 30 years? More? It's a problem with you guys: if the way you see it is nothing is ever your responsibility or fault, then why would you ever change course? Which is exactly what Lisa Su is saying there. I don't expect a deeply introspective postmortem of why they lost this one, but at least a "software is our priority going forward" would be important signaling to the market etc. Her answer isn't that, her answer is everything is going great and why stop when they're winning. Except they're not.

    • it's also worth pointing out that you have abdicated driver support on those currently-sold Zen2/3 APUs with Vega as well... they are essentially legacy-support/security-update-only. And again, I'm sure you see it as "2017 hardware" but you launched hardware with it going into 2021 and that hardware is still for sale, and in fact you continue to sell quite a few Zen2/3 APUs in other markets as well.

      if you want to get traction/start taking ground, you have to actually support the hardware that's in people's PCs, is what I'm saying. The "we support CDNA because it is a direct sale to big customers who pay us money to support it" is good for the books, but it leads to exactly this place you've found yourselves in terms of overall ecosystem. You will never take traction if you don't have the CUDA-style support model both for hardware support/compatibility and software support/compatibility.

      it is telling that Intel, who is currently in equally-dire financial straits, is continuing to double-down on their software spending. At one point they were running -200% operating margins on the dGPU division, because they understand the importance. Apple understands that a functional runtime and a functional library/ecosystem are table stakes too. It literally, truly is just an AMD problem, which brings us back to the vision/locus-of-control problems with the leadership. You could definitely have done this instead of $12 billion of stock buybacks in 2021/2022 if you wanted to, if absolutely nothing else.

      (and again, I disagree with the notion that every single other dollar was maximized and AMD could not have stretched themselves a dollar further in any other way - they just didn't want to do that for something that was seen as unimportant.)