CUDA Dynamic Parallelism undefined reference to __fatbinwrap

LO_UZH · April 27, 2015, 1:23pm

(this is a cross-post from a stackoverflow question)

I have a program containing separately-compiled CUDA and Thrust code (thrust_search.cu), built as follows:

nvcc -c -I/path/to/thrust/ ./src/thrust_search.cu

pgcpp -acc -Minfo -I/path/to/thrust/ -I./ -lrt -I/opt/pgi/linux86-64/2014/cuda/6.5/include/ -L/opt/pgi/linux86-64/2014/cuda/6.5/lib64/ -lcurand -lcudart -o main main.cpp thrust_search.o

The program builds and run fine, but I’d like to activate Dynamic Parallelism. This requires relocatable device code, sm_35 and the cudadevrt library. Furthermore, the use of device relocatable code requires that the device code be compiled and linked in two separate steps. I therefore changed to the following build commands:

nvcc --gpu-architecture=sm_35 --device-c -I/path/to/thrust/ ./src/thrust_search.cu
nvcc --gpu-architecture=sm_35 --device-link thrust_search.o --output-file link.o -lcudadevrt 

pgcpp -acc -Minfo -I/path/to/thrust/ -I./ -lrt -I/opt/pgi/linux86-64/2014/cuda/6.5/include/ -L/opt/pgi/linux86-64/2014/cuda/6.5/lib64/ -lcurand -lcudart -lcudadevrt -o main main.cpp thrust_search.o link.o

I’m now getting the following errors on compilation:

nvlink warning : SM Arch ('sm_20') not found in 'thrust_search.o'
nvlink warning : SM Arch ('sm_30') not found in 'thrust_search.o'
link.o: In function `__cudaRegisterLinkedBinary_66_tmpxft_00007dce_00000000_12_cuda_device_runtime_compute_50_cpp1_ii_5f6993ef':
link.stub:(.text+0x98): undefined reference to `__fatbinwrap_66_tmpxft_00007dce_00000000_12_cuda_device_runtime_compute_50_cpp1_ii_5f6993ef'
pgacclnk: child process exit status 1: /usr/bin/ld

Similar problems I was able to find elsewhere (1, 2, 3, 4, 5) all seem to have been fixed by linking the cudadevrt or cudart library, specifying the sm_35 architecture and compiling and linking the device code in two steps as I’m already doing.

My LD_LIBRARY_PATH contains the path to the libcudadevrt.a file, /usr/local/cuda/lib64, so I do believe that the library is being found. It’s like the library isn’t actually getting linked in. By the way, the error arises only at the pgcpp command stage, not during nvcc compilation or linkage. I’m thinking the problem might have something to do with confusion between PGI CUDA libraries in /opt/pgi/linux86-64/2014/cuda/6.5/lib64/ and the NVIDIA CUDA libraries in /usr/local/cuda/lib64/ which both contain the libcudadevrt.a file.

MatColgrove · April 27, 2015, 6:22pm

Hi LO_UZH,

PGI uses RDC by default so linking shouldn’t be a problem. However also by default, we generate binaries for different compute capabilities. To specifically target compute capability 3.5, add the flag “-ta=tesla:cc35”. This is similar to specifying “–gpu-architecture=sm_35”.

pgcpp -acc -ta=tesla:cc35 -Minfo -I/path/to/thrust/ -I./ -lrt -I/opt/pgi/linux86-64/2014/cuda/6.5/include/ -L/opt/pgi/linux86-64/2014/cuda/6.5/lib64/ -lcurand -lcudart -o main main.cpp thrust_search.o

Please let us know if this works for you.

Best Regards,
Mat

LO_UZH · April 27, 2015, 7:43pm

Hi Mat,

That didn’t fix it unfortunately. The problem doesn’t seem to be with the specification of the sm_35 architecture, but actually the nvcc linking stage. If I remove the linking stage and do not specify device relocatable code, compilation works fine:

nvcc --gpu-architecture=sm_35 -c -I/home/cef13_pp/thrust-v1.8/ ./src/thrust_search.cu -lcudadevrt

pgcpp -acc -ta=tesla:cc35 -Minfo -I/path/to/thrust/ -I./ -lrt -I/opt/pgi/linux86-64/2014/cuda/6.5/include/ -L/opt/pgi/linux86-64/2014/cuda/6.5/lib64/ -lcurand -lcudart -lcudadevrt -o main main.cpp thrust_search.o

MatColgrove · April 27, 2015, 7:55pm

Hi LO_UZH,

Could you please send an example to PGI Customer Service (trs@pgroup.com) and ask them to send it to me?

I’d like to try to reproduce the problem here and see exactly what’s going on.

Thanks,
Mat

LO_UZH · April 28, 2015, 7:54am

Hi Mat,

I just sent it in. Hope we can figure out what’s wrong!

Thanks!

MatColgrove · April 28, 2015, 6:53pm

Hi Laurent,

It took me a bit, but was able to recreate the error. I was using PGI with CUDA 6.5 and it linked fine (though the executable got a runtime error). Final, I moved to using CUDA 7.0 and replicated the error. The fix was to just add “-Mcuda -pgf90libs”.

Note that we just released CUDA 7.0 support in the 15.4 compilers. I wasn’t sure which compiler and CUDA version you are using but I could only reproduce the link error in 15.4 with CUDA 7.0.

Here’s my output:

% make -f makefile_error
nvcc --gpu-architecture=sm_35 --device-c -I/proj/qa/support/LO_UZH/thrust ./src/thrust_search.cu
nvcc --gpu-architecture=sm_35 --device-link thrust_search.o --output-file link.o -lcudadevrt
pgc++ -w -V15.4 -acc -ta=tesla:cc35,cuda7.0 -L/proj/pgi/linux86-64/2015/cuda/7.0/lib64 -I./ -I/proj/pgi/linux86-64/2015/cuda/7.0/include  -lrt -lcurand -lcudart -lcudadevrt -o wrapper wrapper.cpp ./src/demand.cpp ./src/excessdemand.cpp ./src/marketclearing.cpp ./src/raberto01.cpp ./src/standard_deviation.cpp ./src/supply.cpp thrust_search.o link.o
wrapper.cpp:
./src/demand.cpp:
./src/excessdemand.cpp:
./src/marketclearing.cpp:
./src/raberto01.cpp:
./src/standard_deviation.cpp:
./src/supply.cpp:
link.o: In function `__cudaRegisterLinkedBinary_66_tmpxft_000073f4_00000000_16_cuda_device_runtime_compute_52_cpp1_ii_8b1a5d37':
link.stub:(.text+0x98): undefined reference to `__fatbinwrap_66_tmpxft_000073f4_00000000_16_cuda_device_runtime_compute_52_cpp1_ii_8b1a5d37'
pgacclnk: child process exit status 1: /usr/bin/ld
make: *** [wrapper] Error 2

% make -f makefile_error
nvcc --gpu-architecture=sm_35 --device-c -I/proj/qa/support/LO_UZH/thrust ./src/thrust_search.cu
nvcc --gpu-architecture=sm_35 --device-link thrust_search.o --output-file link.o -lcudadevrt
pgc++ -w -V15.4 -acc -ta=tesla:cc35,cuda7.0 -Mcuda -pgf90libs -L/proj/pgi/linux86-64/2015/cuda/7.0/lib64 -I./ -I/proj/pgi/linux86-64/2015/cuda/7.0/include  -lrt -lcurand -lcudart -lcudadevrt -o wrapper wrapper.cpp ./src/demand.cpp ./src/excessdemand.cpp ./src/marketclearing.cpp ./src/raberto01.cpp ./src/standard_deviation.cpp ./src/supply.cpp thrust_search.o link.o
wrapper.cpp:
./src/demand.cpp:
./src/excessdemand.cpp:
./src/marketclearing.cpp:
./src/raberto01.cpp:
./src/standard_deviation.cpp:
./src/supply.cpp:
% wrapper
25.4666

Mat

Topic		Replies	Views
Dynamic parallelism in PVF cannot compile Legacy PGI Compilers	2	6597	March 12, 2015
About dynamic parallelism of CUDA Fortran Legacy PGI Compilers	7	9204	December 2, 2016
dynamic parallelism: undefined reference to `__fatbinwrap_38_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37' CUDA Programming and Performance	6	2323	February 11, 2021
Error when compiling for architectures > 3.5 CUDA Programming and Performance	9	1435	July 4, 2016
nvcc (nvlink) not linking against device code library CUDA Programming and Performance	7	11333	June 20, 2018
Ubuntu 17.04, CUDA 8.0: Linker problems with CUDA examples (dynamic parallelism) CUDA Setup and Installation	3	4739	May 28, 2017
Linker error building CUDA example file for dynamic parallelism CUDA Setup and Installation	5	5561	July 21, 2017
nvcc Segfault CUDA Programming and Performance	6	11408	October 14, 2010
Compiling SDK on opensuse CUDA Programming and Performance	12	14113	August 21, 2009
unresolved external symbol _main referenced in function ___tmainCRTStartup CUDA Programming and Performance	7	9310	February 22, 2011

CUDA Dynamic Parallelism undefined reference to __fatbinwrap

Related topics