CUDA/C++ dynamic parallelism compile issue on aarch64/arm64


I’m having problems compiling and linking a c++ program to a CUDA file that uses dynamic parallelism. The program itself is quite extensive so for testing sake I simplified it to a very basic main.cpp file and file which replicates the error. I’m also using a Jetson Nano (aarch64 /arm64, compute capability - 5.3) to compile these files which I’m thinking may be causing the problem (?).

Here are my programs:


extern void wrapperfunction();

int main(){

#include <cuda.h>
#include <cuda_runtime.h>

using namespace std;

__global__ void update_upper_image(){
	int x = 1;

__global__ void event_kernel(){

void wrapperfunction(){

The commands I’m using to compile are:

nvcc -arch=sm_53 -rdc=true -c
nvcc -arch=sm_53 -dlink -o file_link.o wrapper.o -lcudart -lcudadevrt
g++ wrapper.o file_link.o main.cpp -L/usr/local/cuda/lib64 -lcudart -lcudadevrt

The first two commands work fine, but at the g++ compilation stage I get the following error:

file_link.o: In function __cudaRegisterLinkedBinary_38_cuda_device_runtime_compute_75_cpp1_ii_8b1a5d37': link.stub:(.text+0xcc): undefined reference to __fatbinwrap_38_cuda_device_runtime_compute_75_cpp1_ii_8b1a5d37’
link.stub:(.text+0xd0): undefined reference to `__fatbinwrap_38_cuda_device_runtime_compute_75_cpp1_ii_8b1a5d37’
collect2: error: ld returned 1 exit status

I’ve been working on this problem for a few days now without any luck, so any help would be greatly appreciated!


You are likely to get better and faster answers to questions about Jetson and its software stack in the subforums dedicated to that platform:

1 Like