Cross compiling dynamic parallelism for jetson aarch64 cuda 10.2 nvlink error

lucasm70i1 · November 9, 2021, 8:44pm

Hi,

I have been trying to port dynamic parallelism to jetson aarch64 with cuda 10.2 installed.

After having been suggested here to build the samples, I have tried and I am still getting this error:

root@7f32bae9284e:/cuda-samples/Samples/cdpQuadtree# make TARGET_ARCH=aarch64
>>> GCC Version is greater or equal to 5.0.0 <<<
/usr/local/cuda/bin/nvcc -ccbin aarch64-linux-gnu-g++ -I../../Common  -m64    -dc --std=c++14 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o cdpQuadtree.o -c cdpQuadtree.cu
/usr/local/cuda/bin/nvcc -ccbin aarch64-linux-gnu-g++   -m64      -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o cdpQuadtree cdpQuadtree.o  -lcudadevrt
nvlink error   : Undefined reference to 'cudaGetParameterBufferV2' in 'cdpQuadtree.o' (target: sm_35)
nvlink error   : Undefined reference to 'cudaLaunchDeviceV2' in 'cdpQuadtree.o' (target: sm_35)
Makefile:359: recipe for target 'cdpQuadtree' failed
make: *** [cdpQuadtree] Error 255

I am not sure why nvlink is getting these errors:

nvlink error   : Undefined reference to 'cudaGetParameterBufferV2' in 'cdpQuadtree.o' (target: sm_35)
nvlink error   : Undefined reference to 'cudaLaunchDeviceV2' in 'cdpQuadtree.o' (target: sm_35)

When I build these samples on x86_64 and cuda 10.2 there are no issues.

Let me know if you have any idea.

Thanks.

AastaLLL · November 10, 2021, 2:16am

Hi,

Could you share how do you install the CUDA package on the host?

Please note that it’s required to use the CUDA library that includes the cross-compiling package to build the ARM binary.
You can find these packages from the same JetPack installer directly.

Thanks.

lucasm70i1 · November 11, 2021, 12:10am

Hi @AastaLLL,

Sure,

I am using an nvidia docker image:

FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04
ENV ARCH=aarch64 \
    HOSTCC=gcc \
    TARGET=ARMV8

Which provides me with the default /usr/local/cuda-10.2/ folder.

I then install the cross compilation toolkit in my docker:

wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-cross-aarch64_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-cudart-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-cufft-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-cupti-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-curand-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-cusolver-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-cusparse-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-driver-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-misc-headers-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-npp-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-nsight-compute-addon-l4t-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-nvgraph-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-nvml-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cuda/cuda-nvrtc-cross-aarch64-10-2_10.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/c/cublas/libcublas-cross-aarch64_10.2.2.89-1_all.deb && \
    wget https://repo.download.nvidia.com/jetson/x86_64/pool/r32.4/n/nsight-compute/nsight-compute-addon-l4t-2019.5.0_2019.5.0.14-1_all.deb && \
    dpkg -i --force-all  *.deb && \
    rm *.deb && \
    apt-get update && \
    apt-get install -y -f && \
    apt-get install -y cuda-cross-aarch64 cuda-cross-aarch64-10-2

That’s all I do for installing both cuda10.2 and the cross aarch64 10.2 toolchain.

This works for building non “dynamic parallelism” code, it’s really when I try building dynamic parallelism code, that I get the nvlink errors.

Let me know if there’s an anomaly.

~~Cheers~~.

lucasm70i1 · November 11, 2021, 12:13am

Interestingly, I building this sample natively on the jetson nx installed with jetpack:

~$ make
>>> GCC Version is greater or equal to 5.1.0 <<<
/usr/local/cuda/bin/nvcc -ccbin g++ -I../../Common  -m64   -Xcompiler -I. -dc --std=c++14 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o cdpQuadtree.o -c cdpQuadtree.cu
/usr/local/cuda/bin/nvcc -ccbin g++   -m64   -Xcompiler -I.   -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o cdpQuadtree cdpQuadtree.o  -lcudadevrt
nvlink error   : Undefined reference to 'cudaGetParameterBufferV2' in 'cdpQuadtree.o' (target: sm_50)
nvlink error   : Undefined reference to 'cudaLaunchDeviceV2' in 'cdpQuadtree.o' (target: sm_50)
Makefile:359: recipe for target 'cdpQuadtree' failed
make: *** [cdpQuadtree] Error 255

And I am getting the same linking issue.

However, when I update the SMS version to 70 to match the “Volta” jetson NX architecture, the nvlink error disappears.

In my docker I tried that as well but no luck.

It seems the error is caused by the SMS architecture not finding the right library.

lucasm70i1 · November 11, 2021, 4:55pm

Well nevermind I fixed the issue.

I was building for volta and that required me to only build with SMS 70 or 75 as well as specify the cross compilation root.

Cheers.

system · December 1, 2021, 6:34am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.