I have a system running Red Hat Enterprise Linux 7.7. The system is equipped with an RTX 2080 Ti GPU and has CUDA 10.1 installed. I am trying to run a program that uses NVRTC for run-time compilation. The program queries the compute capability of the device (7.2) and uses that to construct an -arch=compute_72 option for the call to nvrtcCompileProgram(). Unfortunately, this fails with an error message:
nvrtc: error: failed to load builtins for compute_72.
If I modify the program to use =arch=compute_70, then things work, but I’m wary of whether this can have some performance implications.