I have a deep learning library that I’m building with the old Cuda Toolkit 9.2, and I have a limitation where I cannot change the toolkit version I’m using.
I now need to run my CUDA application on a very new GPU. It’s new enough that its architecture is not one that’s supported by NVCC from Cuda 9.2 . So I end up going through JIT compilation, which takes really long.
Is there a workaround that I can use to include the SASS for the new GPU in the fat binary?