NVRTC is not a program that can be called from terminal like nvcc but a library to be linked right?
Yes, NVRTC is a library with a few entry points with which you compile CUDA device source code to PTX device source code inside your application at runtime.
(I don’t know how some of the CUDA toolkit files are named under Liunx, so the following is how it looks under Windows:)
On the host side you need to include the nvrtc.h
header (inside your CUDA/<version>/include
folder) and link against the CUDA export library named nvrtc.lib
(inside the CUDA/<version>/x64/lib
folder to be able to compile your application doing nvrtc API calls.
What calls are needed can be found inside the OptiX SDK example framework when searching for nvrtc.
That export library only contains the interface of the dynamic link libraries which implement the actual NVRTC compiler and a precompiled standard library it needs.
These are located inside the CUDA/<version>/bin
folder and are named with an nvrtc-prefix and the CUDA version, e.g. for Windows CUDA 10.1 they are named nvrtc64_101_0.dll
and nvrtc-builtins64_101.dll
.
These need to be redistributed along with the application.
As explained in the links I posted above, all headers which you’d need to compile the CUDA code would also be required on the target machine (and since license terms forbid shipping these with your application, the end user would need to install CUDA and OptiX SDKs on his/her own.)
Since NVRTC can only compile device code, care needs to be taken to never include any host compiler includes inadvertently (also described inside the linked threads), because you cannot expect a target system to have any compiler installed, at least under Windows.
So when compiling an optiX device code, I still invoke nvcc but link it with -lnvrtc?
Not sure I understand the question. If all your CUDA code is translated to PTX with NVRTC you wouldn’t need NVCC and vice versa. You can also compile all CUDA device code which never changes with NVCC during build time of your project and only translate dynamically generated CUDA sources with NVRTC to PTX at runtime.
If you do not have any need to generate CUDA device code at runtime, there is also no need to use NVRTC at all.
You should simply build everything with NVCC and ship the translated PTX code with your application.
That is what most applications do and what all OptiX SDK examples do when you disable NVRTC inside the CMake settings.