Using Dynamic Parallelism in multiple VS2019 projects

Hello,

I have a VS2019 solution with several projects in it, two of which include CUDA code. I am trying to use dynamic parallelism in both, but am getting a build error that I cannot seem to solve. For context, suppose I have a supporting library project (1), a main compute library project that uses objects from (1), and a Console project that calls the compute library and so links to both.

I’ve uploaded a contrived example here.

(Quick note: the above does not show any good coding practices, it is merely the quickest way for me to get to the problem I’m facing.)

In the above you’ll see a VS solution with 4 projects:

  • CR2 - [Static Library] Project contains CUDA kernel that uses dynamic parallelism (creates a stream within the kernel).
  • CR1 - [Static Library] Project also contains CUDA kernel that leverages dynamic parallelism, and includes a header from CR2 and calls a function from CR2.
  • App - [Application] Project that does not contain CUDA code, but consumes both CR1.lib and CR2.lib.
    CR3 - [Application] Project that contains CUDA code, and consumes both CR1.lib and CR2.lib.

When I build the App project, I receive the following build error:

>CR2.lib(CR2.device-link.obj) : error LNK2005: __cudaRegisterLinkedBinary_38_cuda_device_runtime_compute_86_cpp1_ii_8b1a5d37 already defined in CR1.lib(CR1.device-link.obj)
>D:\CUDAExample\x64\Release\App.exe : fatal error LNK1169: one or more multiply defined symbols found
>Done building project "App.vcxproj" -- FAILED.

I figure this is caused by cudadevrt.lib being included in both CR1.lib and CR2.lib to facilitate the dynamic parallelism, but cannot figure out how to resolve the problem.

I added CR3 to the solution as I suspect I’ll need to do the final CUDA compilation in the main application project, but do not know how to proceed with that.

Any pointers would be immensely helpful.

For context, here is the build output from Visual Studio for the App project:

Rebuild started...
1>------ Rebuild All started: Project: CR2, Configuration: Release x64 ------
1>Compiling CUDA source file kernel2.cu...
1>
1>D:\CUDAExample\CR2>"D:\CUDAExample\/CUDA/BuildToolkit\bin\nvcc.exe" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30037\bin\HostX86\x64" -x cu -rdc=true  -ID:\CUDAExample\/CUDA/BuildToolkit\include -ID:\CUDAExample\/CUDA/BuildToolkit\include     --keep-dir x64\Release  -maxrregcount=0  --machine 64 --compile -cudart static    -DWIN32 -DWIN64 -DNDEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /O2 /FdD:\CUDAExample\x64\Release\CR2.pdb /FS   /MD " -o x64\Release\kernel2.cu.obj "D:\CUDAExample\CR2\kernel2.cu"
1>kernel2.cu
1>
1>D:\CUDAExample\CR2>"D:\CUDAExample\/CUDA/BuildToolkit\bin\nvcc.exe" -dlink -o x64\Release\CR2.device-link.obj -Xcompiler "/EHsc /W3 /nologo /O2 /FdD:\CUDAExample\x64\Release\CR2.pdb   /MD "    -gencode=arch=compute_75,code=sm_75  --machine 64 x64\Release\kernel2.cu.obj
1>kernel2.cu.obj
1>CR2.vcxproj -> D:\CUDAExample\x64\Release\CR2.lib
2>------ Rebuild All started: Project: CR1, Configuration: Release x64 ------
2>Compiling CUDA source file kernel1.cu...
2>
2>D:\CUDAExample\CR1>"D:\CUDAExample\/CUDA/BuildToolkit\bin\nvcc.exe" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30037\bin\HostX86\x64" -x cu -rdc=true  -ID:\CUDAExample\/CUDA/BuildToolkit\include -ID:\CUDAExample\/CUDA/BuildToolkit\include     --keep-dir x64\Release  -maxrregcount=0  --machine 64 --compile -cudart none    -DWIN32 -DWIN64 -DNDEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /O2 /FdD:\CUDAExample\x64\Release\CR1.pdb /FS   /MD " -o x64\Release\kernel1.cu.obj "D:\CUDAExample\CR1\kernel1.cu"
2>kernel1.cu
2>
2>D:\CUDAExample\CR1>"D:\CUDAExample\/CUDA/BuildToolkit\bin\nvcc.exe" -dlink -o x64\Release\CR1.device-link.obj -Xcompiler "/EHsc /W3 /nologo /O2 /FdD:\CUDAExample\x64\Release\CR1.pdb   /MD "    -gencode=arch=compute_75,code=sm_75  --machine 64 x64\Release\kernel1.cu.obj
2>kernel1.cu.obj
2>CR1.vcxproj -> D:\CUDAExample\x64\Release\CR1.lib
3>------ Rebuild All started: Project: App, Configuration: Release x64 ------
3>App.cpp
3>Generating code
3>Previous IPDB not found, fall back to full compilation.
3>All 10 functions were compiled because no usable IPDB/IOBJ from previous compilation was found.
3>Finished generating code
3>CR2.lib(CR2.device-link.obj) : error LNK2005: __cudaRegisterLinkedBinary_38_cuda_device_runtime_compute_86_cpp1_ii_8b1a5d37 already defined in CR1.lib(CR1.device-link.obj)
3>D:\CUDAExample\x64\Release\App.exe : fatal error LNK1169: one or more multiply defined symbols found
3>Done building project "App.vcxproj" -- FAILED.
StopOnFirstBuildError: Build cancelled because project "App" failed to build.
Build has been canceled.

Hi dear customer ,
Could you please file us a ticket following the instruction here Getting Help with CUDA NVCC Compiler - #3 we will take a further look .Thanks .

Library CR1 calls a function from library CR2. That means they need to be linked together (presumably in CR3). But the log shows -dlink calls in both CR1 and CR2 (which means both are doing a device linkm and thus both are brining in the libcudadevrt). those-dlinks need to be removed and instead just do a single dlink in CR3 or APP.

Add a modified project here , note that the CUDA folder is removed to reduce size.CUDA-DDP.7z (4.7 KB)

1 Like

Thank you! This works. I believe I saw another forum post that similarly mentioned this solution but did not follow it.

To summarize, when using dynamic parallelism in multiple projects you can only enable Device Link (CUDA Linker > Perform Device Link = yes) in the “final” project - the main application or library project. This may necessitate adding a dummy .cu file to the final project to trigger CUDA compile. All other settings such as relocatable device code, includes, etc. remain the same (they are not relevant to the aforementioned problem).

I’ve updated the above linked Git repo to have Yuki_Ni’s changes.

Thank you for the solution. Since we faced the same problem yesterday (two static libraries leveraging Dynamic Parallelism) I want to share our solution with CMake.

We ended up adding a static library named cuda_resolve with a dummy.cu linking to those libraries and setting the CMake target property CUDA_RESOLVE_DEVICE_SYMBOLS to ON for cuda_resolve and OFF to the others.