Comment by JonChesterfield

15 days ago

Shipping compiled code works fine if you have a finite set of kernels. Just build everything for every target, gzip the result and send it out. People are a bit reluctant to do that because there are lots of copies of essentially the same information in the result.

I suspect every solution you'll find which involves sending a single copy of the code will have a patched copy of llvm embedded in said install.exe, which ideally compiles the kernels to whatever is around locally at install time, but otherwise does so at application run time. It's not loads of fun deriving a program from llvm but it has been done a lot of times now.

3 comments

JonChesterfield

sorenjan 15 days ago

> Shipping compiled code works fine if you have a finite set of kernels. Just build everything for every target, gzip the result and send it out. People are a bit reluctant to do that because there are lots of copies of essentially the same information in the result.

That's kind of the point, you have to build everything for a lot of different targets. And what happens a year from now when the client have bought the latest GPU and wants to run the same program on that? Not having an intermediary compile target like RTX is a big downside, although I guess it didn't matter for Frontier.

I can't find any solution, AdaptiveCpp seems like the best option but they say Windows support is highly experimental because they depend on a patched llvm, and they only mention OpenMP and Cuda backends anyway. Seems like Cuda is still the best Windows option.

JonChesterfield 15 days ago
There's a degree of moving the goalposts there.
Shipping some machine code today to run on a GPU released tomorrow doesn't work anywhere. Cuda looks like it does provided someone upgrades the cuda installation on the machine after the new GPU is released because the ptx is handled by the cuda runtime. HSAIL was meant to do that on amdgpu but people didn't like it.
That same trick would work on amdgpu - compile to spir-v, wait for a new GPU, upgrade the compiler on the local machine, now you can run that spir-v. The key part is the installing a new JIT which knows what the new hardware is, even if you're not willing to update the program itself. Except that compile to spir-v is slightly off in the weeds for compute kernels at present.
It's tempting to view that as a non-issue. If someone can upgrade the cuda install on the machine, they could upgrade whatever program was running on cuda as well. In practice this seems to annoy people though which is why there's a gradual move toward spir-v, or to shipping llvm IR and rolling the die on the auto-upgrade machinery handling it. Alternatively ship source code and compile it on site, that'll work for people who are willing to apt install a new clang even if they aren't willing to update your program.
- sorenjan 15 days ago
  
  > Cuda looks like it does
  And that's what matters. It might be seen like moving the goal posts by someone who knows how it works in the background and what kind of work is necessary to support the new architecture, but that's irrelevant to end users. Just like end users didn't care about "true" or "fake" dual cores.
  > If someone can upgrade the cuda install on the machine, they could upgrade whatever program was running on cuda as well.
  No, because that would mean that all GPGPU developers have to update their code to support the new hardware, instead of just the runtime taking care of it. I think you're more focused on HPC, data centers, and specialized software with active development and a small user base, but how would that work if I wanted to run a image processing program, video encoder, game, photogrammetry, etc, and the developer have lost interest in it years ago? Or if I have written some software and don't want to have to update it because there's a new GPU out? And isn't the cuda runtime installed by default when installing the driver, which auto updates?
  > there's a gradual move toward spir-v
  It was introduced 9 years ago, what's taking so long?
  > Alternatively ship source code and compile it on site, that'll work for people who are willing to apt install a new clang
  Doesn't seem to work well on Windows since you need to use a patched Clang, and some developers have a thing about shipping their source code.
  On the whole both the developer and the Windows user experience is still very unergonomic, and I really expected the field to have progressed further by now. Nvidia is rightly reaping the reward from their technical investments, but I still hope for a future where I can easily run the same code on any GPU. But I hoped for that 15 years ago.