Compatibility of older CUDA versions with RTX 5090 (Blackwell)

I recently saw the benchmarks for the new RTX 5090, and the Blackwell architecture looks impressive. However, I have a question regarding backward compatibility for AI workloads.

In real-world production environments, there are still many models and frameworks that were built on older CUDA versions. For example, on the H100, running code that depends on CUDA 11.3 or older often leads to build or runtime errors due to architectural incompatibility.

I’m wondering if something similar will happen on the RTX 5090 — specifically, whether older CUDA-based code (e.g., CUDA 11.x) can still run properly, or if recompilation with a newer toolkit (e.g., CUDA 12.5+) is mandatory.

If backward compatibility is an issue, are there any recommended workarounds (such as Docker images, compatibility flags, or PTX-level recompilation)?

Thanks in advance for your insights!


It depends on how the code was built, and what your desires/expectations are. This topic is not “new” in the 5090 or blackwell generation, so you can find various writeups about it.

The CUDA compilation tools with nvcc have a notion of GPU architecture. Furthermore you can specify compilation to SASS (the actual machine code for a specific GPU architecture) or to PTX (an intermediate code, that can be compiled again, sort of) or both, when building any code that uses CUDA with nvcc.

If the code built includes PTX, that is generally forward compatible, meaning a code built with CUDA 11.3 and targeting some older architecture (say, H100) but including PTX will generally be runnable on newer architectures. That is a basic CUDA compatibility mechanism which has been in place throughout CUDA history.

Most CUDA libraries I’m aware of like CUBLAS and CUDNN are built this way, also, so that higher level codes that use these libraries can also work on newer architectures.

This forward compatibility path involves something called JIT recompilation “on the fly” i.e. at the moment you run the code, on a newer architecture. This JIT recompilation can sometimes take a long time (although lazy loading can improve the scenarios, recently). This notable recompilation lag gets mentioned in forum questions, from time to time, as well.

If PTX is not included in the binary object, this kind of forward compatibility is quite limited or non-existent, and such a code will report an error if you run it on an architecture it does not support.

This is a rather involved topic; you can find many posts around the internet discussing various aspects of it.

Yes, something similar will happen on RTX 5090 if a prior code was built without PTX forward compatibility coverage, even if that prior code was perhaps built with PTX but may link to a library that was not built with PTX. In that event, there are no docker images or “flags” to fix this or work around this.