My program consists of two parts: one part includes some fixed functions stored in the static.cu
file, and the other part needs to be dynamically generated based on the task. The generated content is a __global__
function that needs to call functions defined in static.cu
. To achieve this, I compiled static.cu
offline, then dynamically generated dyn.cu
according to the task requirements and used nvRTC
to compile it. Finally, I used nvJitLink
to link the two parts into cubin. However, the calculation results are incorrect. I’m not sure if my compilation and linking methods are correct. Here are the specific steps I followed:
- Compile the
static.cu
file offline into object file
nvcc -gencode=arch=compute_52,code=[compute_52,lto_52] -O3 -std=c++17 -rdc=true -Xcompiler -fPIC -MD -MT static.o -o static.o static.cu
- Compile the dynamically generated
dyn.cu
file usingnvRTC
nvrtcProgram prog;
nvrtcCreateProgram(&prog, content_of_dyn_cu, nullptr, 0, nullptr, nullptr);// content_of_dyn_cu is a pointer of type const char *;
//Call nvrtcCompileProgram with options "-dlto", "-rdc=true", "--device-int128", "-arch=compute_52"
//Call nvrtcGetLTOIR to get lto IR of static.cu
- Use
nvJitLink
to link the offline compiledstatic.o
with the LTOIR of the dynamically compileddyn.cu
to generate PTX.
//Call nvJitLinkCreate with options "-lto", "-arch=sm_86"
//Call the nvJitLinkAddData function to add the LTO IR of dyn.cu and static.o
//Call nvJitLinkComplete
- Call nvJitLinkGetLinkedCubin to get the cubin.
The program produced by the above steps yields incorrect results. However, if both parts are compiled and linked using nvcc
, the calculation results are correct. Below are the parameters used for compilation and linking.
nvcc -dc -gencode=arch=compute_52,code=[compute_52,lto_52] -std=c++17 -O3 -o static.o static.cu
nvcc -dc -gencode=arch=compute_52,code=[compute_52,lto_52] -std=c++17 -O3 -o dyn.o dyn.cu
nvcc -cubin -dlink -arch=compute_52 -code=sm_86 -o final.cubin dyn.o static.o
Here is my question:
- When performing offline compilation using only nvcc, are the compilation options correct (i.e., is LTO enabled)?
- What are the differences between performing a full offline compilation using only nvcc and a semi-offline, semi-real-time compilation using nvcc, nvRTC, and nvJitLink? Why does the program compile successfully with nvcc alone, but encounter issues when using nvcc, nvRTC, and nvJitLink for the semi-offline, semi-real-time compilation?
CUDA:Cuda compilation tools, release 12.6, V12.6.20
OS: Ubuntu 20.04.5 LTS
GPU: Geforce 3080