Cross compiling dynamic parallelism for jetson aarch64 cuda 10.2 nvlink error

Hi,

I have been trying to port dynamic parallelism to jetson aarch64 with cuda 10.2 installed.

After having been suggested here to build the samples, I have tried and I am still getting this error:

root@7f32bae9284e:/cuda-samples/Samples/cdpQuadtree# make TARGET_ARCH=aarch64
>>> GCC Version is greater or equal to 5.0.0 <<<
/usr/local/cuda/bin/nvcc -ccbin aarch64-linux-gnu-g++ -I../../Common  -m64    -dc --std=c++14 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o cdpQuadtree.o -c cdpQuadtree.cu
/usr/local/cuda/bin/nvcc -ccbin aarch64-linux-gnu-g++   -m64      -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o cdpQuadtree cdpQuadtree.o  -lcudadevrt
nvlink error   : Undefined reference to 'cudaGetParameterBufferV2' in 'cdpQuadtree.o' (target: sm_35)
nvlink error   : Undefined reference to 'cudaLaunchDeviceV2' in 'cdpQuadtree.o' (target: sm_35)
Makefile:359: recipe for target 'cdpQuadtree' failed
make: *** [cdpQuadtree] Error 255

I am not sure why nvlink is getting these errors:

nvlink error   : Undefined reference to 'cudaGetParameterBufferV2' in 'cdpQuadtree.o' (target: sm_35)
nvlink error   : Undefined reference to 'cudaLaunchDeviceV2' in 'cdpQuadtree.o' (target: sm_35)

When I build these samples on x86_64 and cuda 10.2 there are no issues.

Let me know if you have any idea.

Thanks.

Hi,

Could you share how do you install the CUDA package on the host?

Please note that it’s required to use the CUDA library that includes the cross-compiling package to build the ARM binary.
You can find these packages from the same JetPack installer directly.

Thanks.

Hi @AastaLLL,

Sure,

I am using an nvidia docker image:

FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04
ENV ARCH=aarch64 \
    HOSTCC=gcc \
    TARGET=ARMV8

Which provides me with the default /usr/local/cuda-10.2/ folder.

I then install the cross compilation toolkit in my docker:

wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-cross-aarch64_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-cudart-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-cufft-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-cupti-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-curand-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-cusolver-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-cusparse-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-driver-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-misc-headers-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-npp-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-nsight-compute-addon-l4t-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-nvgraph-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-nvml-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-nvrtc-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cublas/libcublas-cross-aarch64_10.2.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/n/nsight-compute/nsight-compute-addon-l4t-2019.5.0_2019.5.0.14-1_all.deb && \
    dpkg -i --force-all  *.deb && \
    rm *.deb && \
    apt-get update && \
    apt-get install -y -f && \
    apt-get install -y cuda-cross-aarch64 cuda-cross-aarch64-10-2 

That’s all I do for installing both cuda10.2 and the cross aarch64 10.2 toolchain.

This works for building non “dynamic parallelism” code, it’s really when I try building dynamic parallelism code, that I get the nvlink errors.

Let me know if there’s an anomaly.

Cheers.

Interestingly, I building this sample natively on the jetson nx installed with jetpack:

~$ make
>>> GCC Version is greater or equal to 5.1.0 <<<
/usr/local/cuda/bin/nvcc -ccbin g++ -I../../Common  -m64   -Xcompiler -I. -dc --std=c++14 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o cdpQuadtree.o -c cdpQuadtree.cu
/usr/local/cuda/bin/nvcc -ccbin g++   -m64   -Xcompiler -I.   -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o cdpQuadtree cdpQuadtree.o  -lcudadevrt
nvlink error   : Undefined reference to 'cudaGetParameterBufferV2' in 'cdpQuadtree.o' (target: sm_50)
nvlink error   : Undefined reference to 'cudaLaunchDeviceV2' in 'cdpQuadtree.o' (target: sm_50)
Makefile:359: recipe for target 'cdpQuadtree' failed
make: *** [cdpQuadtree] Error 255

And I am getting the same linking issue.

However, when I update the SMS version to 70 to match the “Volta” jetson NX architecture, the nvlink error disappears.

In my docker I tried that as well but no luck.

It seems the error is caused by the SMS architecture not finding the right library.

Well nevermind I fixed the issue.

I was building for volta and that required me to only build with SMS 70 or 75 as well as specify the cross compilation root.

Cheers.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.