Nvlink seems not to link for cuda libraries if cross compiling and --cpu-arch=AARCH64 is specified

Cuda separable compilation seems not to work if cross compiling. The test code I am using available here:

https://github.com/siposcsaba89/cuda_separable_compilation_test.git

I get linker errors:
nvlink error : Undefined reference to ‘cudaGetParameterBufferV2’ in ‘CMakeFiles/cuda_separable_compilation_test.dir/main.cu.o’
nvlink error : Undefined reference to ‘cudaLaunchDeviceV2’ in ‘CMakeFiles/cuda_separable_compilation_test.dir/main.cu.o’

nvcc command:
/usr/local/cuda/bin/nvcc -ccbin=/usr/bin/aarch64-linux-gnu-g++ -arch=sm_50 -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink CMakeFiles/cuda_separable_compilation_test.dir/main.cu.o -o CMakeFiles/cuda_separable_compilation_test.dir/cmake_device_link.o

Verbose output:

#$ SPACE=
#$ CUDART=cudart
#$ HERE=/usr/local/cuda/bin
#$ THERE=/usr/local/cuda/bin
#$ TARGET_SIZE=
#$ TARGET_DIR=
#$ TARGET_DIR=targets/aarch64-linux
#$ TOP=/usr/local/cuda/bin/…
#$ NVVMIR_LIBRARY_DIR=/usr/local/cuda/bin/…/nvvm/libdevice
#$ LD_LIBRARY_PATH=/usr/local/cuda/bin/…/lib:
#$ PATH=/usr/local/cuda/bin/…/nvvm/bin:/usr/local/cuda/bin:/home/csaba/bin:/home/csaba/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/cuda/bin/
#$ INCLUDES=“-I/usr/local/cuda/bin/…/targets/aarch64-linux/include”
#$ LIBRARIES= “-L/usr/local/cuda/bin/…/targets/aarch64-linux/lib/stubs” “-L/usr/local/cuda/bin/…/targets/aarch64-linux/lib”
#$ CUDAFE_FLAGS=
#$ PTXAS_FLAGS=
#$ nvlink --arch=sm_50 --register-link-binaries=“/tmp/tmpxft_0001adb7_00000000-2_cmake_device_link.reg.c” -m64 “-L/usr/local/cuda/bin/…/targets/aarch64-linux/lib/stubs” “-L/usr/local/cuda/bin/…/targets/aarch64-linux/lib” -cpu-arch=AARCH64 “CMakeFiles/cuda_separable_compilation_test.dir/main.cu.o” -lcudadevrt -o “/tmp/tmpxft_0001adb7_00000000-4_cmake_device_link.sm_50.cubin”
nvlink error : Undefined reference to ‘cudaGetParameterBufferV2’ in ‘CMakeFiles/cuda_separable_compilation_test.dir/main.cu.o’
nvlink error : Undefined reference to ‘cudaLaunchDeviceV2’ in ‘CMakeFiles/cuda_separable_compilation_test.dir/main.cu.o’

nvcc version: Cuda compilation tools, release 9.2, V9.2.88

If not cross compiling everything works fine.

I am on Ubuntu 16.04 with cuda 9.2 istalled from dripepx2 PDK5, which contains target files for x86_64 and aarch64 architecture.
In the repo there is a cmake toolchaim file I used for cmake configure, like:
cmake … -DCMAKE_TOOLCHAIN_FILE=$PWD/…/aarch64-linux-gnu.cmake

Could you please help me to resolve this issue? What do I miss?

Thanks in advance,
Csaba

Hi @csaba.sipos, did you ever solve this problem?

Dear @ericnathanmiller @csaba.sipos
Do you have any issues with CUDA cross compilation? If so, please include your code as a CUDA sample inside /usr/local/cuda/samples and make changes in Makefile. Please check cross compile using make TARGET_ARCH==aarch64 command (please see CUDA Samples :: CUDA Toolkit Documentation for more details)

Hi @SivaRamaKrishnaNV,

I tried to cross compile the samples, I am getting the same error as specified by @csaba.sipos .

Any updates on this, I am trying to cross compile some custom dynamic parallelism code to no avail.

#Cross compilation step to *.cu
/usr/local/cuda/bin/nvcc -m64 --compiler-bindir /usr/bin/aarch64-linux-gnu-g++-7 /usr/local/cuda/targets/aarch64-linux/lib/libcudadevrt.a -I/usr/local/include/ --gpu-architecture=sm_50 -rdc=true custom_code_dynamic_parallelism.cu -I/usr/local/cuda/include/ -Iinclude -Xlinker -L/usr/local/cuda/targets/aarch64-linux/lib/ -Xlinker -lcudadevrt

The above part properly generates the object files.

#Linking step for .o
/usr/local/cuda/bin/nvcc -m64 --compiler-bindir /usr/bin/aarch64-linux-gnu-g++-7 --gpu-architecture=sm_61 -dlink gpu_link.o custom_code_dynamic_parallelism.o -Xlinker -L/usr/local/cuda/targets/aarch64-linux/lib/ -lcudadevrt -lcudart

However, the nvlink step fails with this outputs:

@E@nvlink error   : Undefined reference to 'cudaGetParameterBufferV2' in 'cuda_quad_tree.o'
@E@nvlink error   : Undefined reference to 'cudaLaunchDeviceV2' in 'cuda_quad_tree.o'

I ran nm -A through all cuda libraries, I am not able to find which the library containing the cudaLaunchDeviceV2 nor cudaGetParameterBufferV2symbols.

Whenever I try compiling and linking without dynamic parallelism it works.

I am using cuda 10.2.

I am not sure what I am missing here.

Any help would be greatly appreciated.

Dear @lucasm70i1,
Please file a new topic with details about the issue. Also, please check if you are able to cross compile the shipped CUDA samples first.

I already tried building the samples, I will try again and open a new topic with my results.

Cheers.

@SivaRamaKrishnaNV

Follow up topic with my cross compilation results from your samples:

Let me know if I am doing something wrong for getting these linking errors.

Thanks!
Cheers.