I have a deep learning library that I’m building with the old Cuda Toolkit 9.2, and I have a limitation where I cannot change the toolkit version I’m using.
I now need to run my CUDA application on a very new GPU. It’s new enough that its architecture is not one that’s supported by NVCC from Cuda 9.2 . So I end up going through JIT compilation, which takes really long.
Is there a workaround that I can use to include the SASS for the new GPU in the fat binary?
The different GPU architectures are not binary compatible. That means you cannot generate SASS (machine code) for a new architecture with an old toolchain. The designated workaround is the one you are already using: Generate PTX code for the latest version the toolchain supports and have the compiler backend that comes with the driver package JIT-compile that to SASS. This obviously requires a recent enough driver.
I would recommend the PTX-JIT-compile approach only for a transitional period while deployment of the latest toolchain is in progress. What is the specific reason you cannot upgrade to a newer CUDA version? 9.2 is quite old, from 2018. You could always install multiple CUDA versions and switch between them as needed.
Your guess is at good as mine. I would think there is a reason the toolchain is distributed as part of a toolkit and not as a standalone component that can be mixed & matched at will with the rest of the toolkit components. Even if it would seem to work, such a setup might break at any time, and you would have no support for such Frankenbuild.
Why is it that you are stuck with CUDA 9.2? That might be an easier problem to solve.