CUDA/C++ dynamic parallelism compile issue on aarch64/arm64

SophieJ · September 2, 2020, 12:03pm

Hi!

I’m having problems compiling and linking a c++ program to a CUDA file that uses dynamic parallelism. The program itself is quite extensive so for testing sake I simplified it to a very basic main.cpp file and wrapper.cu file which replicates the error. I’m also using a Jetson Nano (aarch64 /arm64, compute capability - 5.3) to compile these files which I’m thinking may be causing the problem (?).

Here are my programs:

main.cpp:

extern void wrapperfunction();

int main(){
	wrapperfunction();
}

wrapper.cu:

#include <cuda.h>
#include <cuda_runtime.h>

using namespace std;

__global__ void update_upper_image(){
	int x = 1;
}

__global__ void event_kernel(){
    update_upper_image<<<512,1>>>();
}

void wrapperfunction(){
	event_kernel<<<4,1>>>();
}

The commands I’m using to compile are:

nvcc -arch=sm_53 -rdc=true -c wrapper.cu
nvcc -arch=sm_53 -dlink -o file_link.o wrapper.o -lcudart -lcudadevrt
g++ wrapper.o file_link.o main.cpp -L/usr/local/cuda/lib64 -lcudart -lcudadevrt

The first two commands work fine, but at the g++ compilation stage I get the following error:

file_link.o: In function __cudaRegisterLinkedBinary_38_cuda_device_runtime_compute_75_cpp1_ii_8b1a5d37': link.stub:(.text+0xcc): undefined reference to __fatbinwrap_38_cuda_device_runtime_compute_75_cpp1_ii_8b1a5d37’
link.stub:(.text+0xd0): undefined reference to `__fatbinwrap_38_cuda_device_runtime_compute_75_cpp1_ii_8b1a5d37’
collect2: error: ld returned 1 exit status
rufus@rufus-desktop:~/Documents/CUDA_CODE/cm

I’ve been working on this problem for a few days now without any luck, so any help would be greatly appreciated!

Thanks,
Sophie

njuffa · September 2, 2020, 6:42pm

You are likely to get better and faster answers to questions about Jetson and its software stack in the subforums dedicated to that platform:

Topic		Replies	Views
CUDA/C++ dynamic parallelism compile issue on aarch64/arm64 Jetson Nano cuda	5	807	October 18, 2021
Dynamic parallelism on Jetson TX1 isn't working properly Jetson TX1	0	557	June 20, 2016
Cross compiling dynamic parallelism for jetson aarch64 cuda 10.2 nvlink error Jetson TX2 cuda , compile	5	1287	December 1, 2021
CUDA Dynamic Parallelism undefined reference to __fatbinwrap Legacy PGI Compilers	5	11972	April 28, 2015
Can't locate cuda_runtime.h when compiling CUDA and C++ files CUDA NVCC Compiler	3	13648	January 9, 2022
Cmake aarch64 cross compile error Jetson AGX Xavier	4	5034	October 18, 2021
Jetson-inference - linker error (github -dusty-nv) Jetson Nano jetson-inference	7	450	November 21, 2023
Nvlink seems not to link for cuda libraries if cross compiling and --cpu-arch=AARCH64 is specified General driveos-cuda	6	2259	November 9, 2021
Compile cuda program with Dynamic Parallelism Jetson TX2	4	3717	October 18, 2021
Unable to cross compile CUDA based code for Jetson Nano CUDA Developer Tools	0	547	July 6, 2020

CUDA/C++ dynamic parallelism compile issue on aarch64/arm64

Related topics