(this is a cross-post from a stackoverflow question)
I have a program containing separately-compiled CUDA and Thrust code (thrust_search.cu), built as follows:
nvcc -c -I/path/to/thrust/ ./src/thrust_search.cu
pgcpp -acc -Minfo -I/path/to/thrust/ -I./ -lrt -I/opt/pgi/linux86-64/2014/cuda/6.5/include/ -L/opt/pgi/linux86-64/2014/cuda/6.5/lib64/ -lcurand -lcudart -o main main.cpp thrust_search.o
The program builds and run fine, but I’d like to activate Dynamic Parallelism. This requires relocatable device code, sm_35 and the cudadevrt library. Furthermore, the use of device relocatable code requires that the device code be compiled and linked in two separate steps. I therefore changed to the following build commands:
nvcc --gpu-architecture=sm_35 --device-c -I/path/to/thrust/ ./src/thrust_search.cu
nvcc --gpu-architecture=sm_35 --device-link thrust_search.o --output-file link.o -lcudadevrt
pgcpp -acc -Minfo -I/path/to/thrust/ -I./ -lrt -I/opt/pgi/linux86-64/2014/cuda/6.5/include/ -L/opt/pgi/linux86-64/2014/cuda/6.5/lib64/ -lcurand -lcudart -lcudadevrt -o main main.cpp thrust_search.o link.o
I’m now getting the following errors on compilation:
nvlink warning : SM Arch ('sm_20') not found in 'thrust_search.o'
nvlink warning : SM Arch ('sm_30') not found in 'thrust_search.o'
link.o: In function `__cudaRegisterLinkedBinary_66_tmpxft_00007dce_00000000_12_cuda_device_runtime_compute_50_cpp1_ii_5f6993ef':
link.stub:(.text+0x98): undefined reference to `__fatbinwrap_66_tmpxft_00007dce_00000000_12_cuda_device_runtime_compute_50_cpp1_ii_5f6993ef'
pgacclnk: child process exit status 1: /usr/bin/ld
Similar problems I was able to find elsewhere (1, 2, 3, 4, 5) all seem to have been fixed by linking the cudadevrt or cudart library, specifying the sm_35 architecture and compiling and linking the device code in two steps as I’m already doing.
My LD_LIBRARY_PATH contains the path to the libcudadevrt.a file, /usr/local/cuda/lib64, so I do believe that the library is being found. It’s like the library isn’t actually getting linked in. By the way, the error arises only at the pgcpp command stage, not during nvcc compilation or linkage. I’m thinking the problem might have something to do with confusion between PGI CUDA libraries in /opt/pgi/linux86-64/2014/cuda/6.5/lib64/ and the NVIDIA CUDA libraries in /usr/local/cuda/lib64/ which both contain the libcudadevrt.a file.