Trouble building with Dynamic Parallelism

I’m having great difficulties compiling a project where I am trying to use dynamic parallelism.

I have a K20C card and have the project set up to build for compute capability 3.5
In NSight Eclipse under the project settings I have selected to link with -lcudadevrt
I also have select to compile with the -fPIC

Here is the build command and output I initially got on the file causing trouble(the one where I am using dynamic parallelism):

Building file: …/src/PCG_Kernels.cu
Invoking: NVCC Compiler
nvcc -G -g -O0 -Xcompiler -fPIC -gencode arch=compute_35,code=sm_35 -odir “src” -M -o “src/PCG_Kernels.d” “…/src/PCG_Kernels.cu”
nvcc --compile -G -O0 -Xcompiler -fPIC -g -gencode arch=compute_35,code=compute_35 -gencode arch=compute_35,code=sm_35 -x cu -o “src/PCG_Kernels.o” “…/src/PCG_Kernels.cu”

ptxas fatal : Unresolved extern function ‘cudaGetParameterBuffer’
make: *** [src/PCG_Kernels.o] Error 255

To fix this I opened the file specific settings and modified nvcc command to be invoked like this: nvcc -dc -gencode arch=compute_35,code=sm_35

This changes the build command for this file to be:

Building file: …/src/PCG_Kernels.cu
Invoking: NVCC Compiler
nvcc -G -g -O0 -Xcompiler -fPIC -gencode arch=compute_35,code=sm_35 -odir “src” -M -o “src/PCG_Kernels.d” “…/src/PCG_Kernels.cu”
nvcc -dc -gencode arch=compute_35,code=compute_35 -G -g -O0 -Xcompiler -fPIC “src/PCG_Kernels.o” “…/src/PCG_Kernels.cu”

Finished building: …/src/PCG_Kernels.cu

But then the linker stage has some issues!

Building target: libIPM_Dynamic
Invoking: NVCC Linker
nvcc -shared -link -o “libIPM_Dynamic” ./src/Main.o ./src/Manager.o ./src/Matrix.o ./src/PCG.o ./src/PCG_Kernels.o ./src/Vector.o ./src/matlabInterface.o -lpthread -lcudadevrt -lhdf5 -lz -lmatio
g++: ./src/PCG_Kernels.o: No such file or directory

I’ve spent about 3 hours trying to build this – I must be missing something, any ideas?

The reason for this error:

g++: ./src/PCG_Kernels.o: No such file or directory

is because this command line is supposed to have a -o switch immediately prior to the output file specification:

nvcc -dc -gencode arch=compute_35,code=compute_35 -G -g -O0 -Xcompiler -fPIC “src/PCG_Kernels.o” “…/src/PCG_Kernels.cu”

Since that is evidently broken, my guess would be that you broke it somehow when you did this:

“I opened the file specific settings and modified nvcc command to be invoked like this: nvcc -dc -gencode arch=compute_35,code=sm_35”

That generally is not the right way to enable relocatable device code compilation. You are supposed to do this via project settings. Study one of the CUDA sample projects that uses relocatable device code to understand how to make this project setting, or study this:

https://stackoverflow.com/questions/38260577/generating-relocatable-device-code-using-nvidia-nsight

In fact, since your goal is to use CUDA dynamic parallelism, you may want to study one of the CUDA sample projects that uses that.