dynamic parallelism. link error: undefined reference to 'main'

My cuda code works fine before I used dynamic parallelism. To allow calling a child kernel inside a parent kernel, I modified my makefile to be:

#CC=gcc                          # used C-compiler
CFLAGS=-O3 -fopenmp                    # release C-Compiler flags
#CFLAGS=   -g -Wall -pedantic  # debugging C-Compiler flags
LD=g++      # used linker
LFLAGS=-O3   -lstdc++ -std=c++11  -fopenmp -lpthread        # linker flags
LIBS=-L. -lm   -lz                 # used libraries

all: mven

	rm -f *.o
	rm -f mven
help:   info

mven:  nsampen.o math.o 
	  $(LD) $(LFLAGS) -L/usr/local/cuda/lib64 -lcuda nsampen.o math.o -o mven $(LIBS) -lcudart

sampen.o: sampenv2.cu mathfunc.h
	#$(CC) -c $(CFLAGS) sampenv2.cu -o mven.o
	nvcc -std=c++11  -arch=sm_61 -c -dc sampenv2.cu -o sampen.o
nsampen.o: sampen.o math.o 
	nvcc -arch=sm_61 -dlink -o nsampen.o sampen.o math.o -lcudart

math.o: mathfunc.cpp mathfunc.h 
	$(CC) -c $(CFLAGS) mathfunc.cpp -o math.o

compiling seems to be fine, but link using g++ met the problem of "undefined reference to main and 3 others. Any suggestions?
Thank you so much!


add sampen.o to your final link step:

$(LD) $(LFLAGS) -L/usr/local/cuda/lib64 -lcuda sampen.o nsampen.o math.o -o mven $(LIBS) -lcudart

you may also need to add -lcudadevrt to the above final link step, depending on your actual code

I’m also not sure math.o belongs in your device-link step, but that may depend on your actual code

Solved as suggested. I thought the link step by nvcc had already put everything together. Many thanks.

It doesn’t. You could discover this by studying any of the CUDA sample codes that use dynamic parallelism, or refer to the NVCC documentation which gives example usage:


Refer to the example given for “If you want to invoke the device and host linker separately, you can do:” which covers this topic exactly.

Proving once more that there is truth in the old adage: “If all else fails, read the instructions” :-)