← Back to context

Comment by cepth

16 days ago

(Part 2 of 2)

> Please show me an AMD GPU with even eight years of support. Back to focus, ROCm isn't even that old and AMD is infamous for removing support for GPUs, often within five years if not less.

As you yourself noted, CDNA vs RDNA makes things more complicated in AMD land. I also think it’s unfair to ask about “eight years of support” when the first RDNA card didn’t launch until 2019, and the first CDNA “accelerator” in 2020.

The Vega and earlier generation is so fundamentally different that it would’ve been an even bigger lift for the already small ROCm team to maintain compatibility.

If we start seeing ROCm removing support for RDNA1 and CDNA1 cards soon, then I’ll share your outrage. But I think ROCm 6 removing support for Radeon VII was entirely understandable.

> Generally agree but back to focus and discipline it's a shame that it took a massive "AI" goldrush over the past ~18 months for them to finally take it vaguely seriously. Now you throw in the fact that Nvidia has absurdly more resources, their 30% R&D spend on software is going to continue to rocket CUDA ahead of ROCm.

> For Frontier and elsewhere I really want AMD to succeed, I just don't think it does them (or anyone) any favors by pretending that all is fine in ROCm land.

The fact is that the bulk of AMD profits is still coming from CPUs, as it always has. AMD wafer allotment at TSMC has to first go towards making its hyperscaler CPU customers happy. If you promise AWS/Azure/GCP hundreds of thousands of EPYC CPUs, you better deliver.

I question how useful it is to dogpile (not you personally, but generally) on AMD, when the investments in people and dollars are trending in the right decision. PyTorch and TensorFlow were broken on ROCm until relatively recently. Now that they work, you (not unreasonably) ask where the other stuff is.

The reality is that NVIDIA will likely forever be the leader with CUDA. I doubt we’ll ever see PhD students and university labs making ROCm their first choice when having to decide where to conduct career-making/breaking research.

But, I don’t think it’s really debatable that AMD is closing the relative gap, given the ROCm ecosystem didn’t exist until at all relatively recently. I’m guessing the very credible list of software partners now at least trying ROCm (https://www.amd.com/en/corporate/events/advancing-ai.html#ec...) are not committing time + resources to an ecosystem that they see as hopeless.

---

Final thoughts:

A) It was completely rational for AMD to focus on devoting the vast majority of R&D spend to its CPUs (particularly server/EPYC), particularly after the success of Zen. From the day that Lisa Su took over (Oct 8, 2014), the stock is up 50x+ (even more earlier in 2024), not that share price is reflective of value in the short term. AMD revenue for calendar year 2014 was $5.5B, operating income negative 155 million. Revenue for 2023 was $22.68B, operating income $401 million. Operating income was substantially higher in 2022 ($1.2B) and 2021 ($3.6B), but AMD has poured that money into R&D spending (https://www.statista.com/statistics/267873/amds-expenditure-...), as well as the Xilinx acquisition.

B) It was completely rational for NVIDIA to build out CUDA, as a way to make it possible to do what they initially called "scientific computing" and eventually "GPU-accelerated computing". There's also the reality that Jensen, the consummate hype man, had to sell investors a growth story. The reality is that gaming will always be a relatively niche market. Cloud gaming (GeForce Now) never matched up to revenue expectations.

C) It’s difficult for me to identify any obvious “points of divergence” that in an alternate history would’ve led to better outcomes with AMD. Without the benefit of “future knowledge”, at what point should AMD have ramped up ROCm investment? Given, as I noted above, in the months before ChatGPT went viral, Jensen’s GTC keynote gave only a tiny mention to LLMs.

D) If anything, the company that missed out was Intel. Beyond floundering on the transition from 14nm to 10nm (allowing TSMC and thus AMD to surpass them), Intel wasted its CPU-monopoly years and the associated profits. Projects like Larrabee (https://www.anandtech.com/show/3738/intel-kills-larrabee-gpu...) and Xe (doomed in part by internal turf wars) (https://www.tomshardware.com/news/intel-axes-xe-hp-gpus-for-...) were killed off. R&D spending was actually comparable to the amount spent on share buybacks in 2011 (14.1B in buybacks vs 8.3B in R&D spending), 2014 (10.7B vs 11.1B), 2018 (10.8B vs 13.B), 2019 (13.5B vs 13.3B) and 2020 (14.1B vs 13.55B). (See https://www.intc.com/stock-info/dividends-and-buybacks and https://www.macrotrends.net/stocks/charts/INTC/intel/researc...).

lol AMD flogged its floundering foundry waaay before Intel ran into any problems.

in fact most of your points about AMD's lack of dough can be traced back to that disaster. The company wasn't hit by some meteorite. It screwed up all by itself.

Then lucky it had that duopolistic X86 licence to lean on or it would have gone the way of Zilog or Motorola. 'Cos it sure can't rely on its janky compute offering.

  • Assuming you're not just here to troll (doubtful given your comment history, but hey I'm feeling generous):

    > lol AMD flogged its floundering foundry waaay before Intel ran into any problems.

    Not wanting/being able to spend to compete on the leading edge nodes is an interesting definition of "floundering". Today there is exactly 1 foundry in the world that's on that leading edge, TSMC. We'll see how Intel Foundry works out, but they're years behind their revenue/ramp targets at this point.

    It's fairly well known that Brian Krzanich proposed spinning out Intel's foundry operations, but the board said no.

    The irony is that trailing edge fabs are wildly profitable, since the capex is fully amortized. GloFo made $1 billion in net income in FY2023.

    > in fact most of your points about AMD's lack of dough can be traced back to that disaster. The company wasn't hit by some meteorite. It screwed up all by itself

    Bulldozer through Excavator were terrible architectures. What does this have to do with what's now known as Global Foundries?

    GloFo got spun out with Emirati money in March 2009. Bulldozer launched in Q4 2011. What's the connection?

    AMD continued to lose market share (and was unprofitable) for years after the foundry was spun out. Bad architectural choices, and bad management, sure. Overpaying for ATI, yep. "Traced back" to GloFo? How?

    > Then lucky it had that duopolistic X86 licence to lean on or it would have gone the way of Zilog or Motorola. 'Cos it sure can't rely on its janky compute offering.

    "Janky" when? "Rely" implies present tense. You're saying AMD compute offerings are janky today?