I have a system running Red Hat Enterprise Linux 7.7. The system is equipped with an RTX 2080 Ti GPU and has CUDA 10.1 installed. I am trying to run a program that uses NVRTC for run-time compilation. The program queries the compute capability of the device (7.2) and uses that to construct an -arch=compute_72 option for the call to nvrtcCompileProgram(). Unfortunately, this fails with an error message:
nvrtc: error: failed to load builtins for compute_72.
If I modify the program to use =arch=compute_70, then things work, but I’m wary of whether this can have some performance implications.
What could be going wrong?
You can use the gencode argument to use the Compute 7.0 architecture and to generate SASS assembly code specifically for the 7.2 version of it.
I haven’t really played with the NVRTC interface yet, so I do not quite know what is possible with it and what isn’t.
RTX2080Ti is compute capability 7.5 not 7.2
Based on what you have described here, you shouldn’t be using compute capability 7.2 at all.
You’re right, the program was wrong to pick compute_72.
(Now, for mysterious reasons, using compute_75 makes it run much slower than when using compute_60 - but that’s a different concern.)