Using Dynamic Parallelism in multiple VS2019 projects

lucus.vanblaircum · August 6, 2021, 8:09pm

Hello,

I have a VS2019 solution with several projects in it, two of which include CUDA code. I am trying to use dynamic parallelism in both, but am getting a build error that I cannot seem to solve. For context, suppose I have a supporting library project (1), a main compute library project that uses objects from (1), and a Console project that calls the compute library and so links to both.

I’ve uploaded a contrived example here.

(Quick note: the above does not show any good coding practices, it is merely the quickest way for me to get to the problem I’m facing.)

In the above you’ll see a VS solution with 4 projects:

CR2 - [Static Library] Project contains CUDA kernel that uses dynamic parallelism (creates a stream within the kernel).
CR1 - [Static Library] Project also contains CUDA kernel that leverages dynamic parallelism, and includes a header from CR2 and calls a function from CR2.
App - [Application] Project that does not contain CUDA code, but consumes both CR1.lib and CR2.lib.
CR3 - [Application] Project that contains CUDA code, and consumes both CR1.lib and CR2.lib.

When I build the App project, I receive the following build error:

>CR2.lib(CR2.device-link.obj) : error LNK2005: __cudaRegisterLinkedBinary_38_cuda_device_runtime_compute_86_cpp1_ii_8b1a5d37 already defined in CR1.lib(CR1.device-link.obj)
>D:\CUDAExample\x64\Release\App.exe : fatal error LNK1169: one or more multiply defined symbols found
>Done building project "App.vcxproj" -- FAILED.

I figure this is caused by cudadevrt.lib being included in both CR1.lib and CR2.lib to facilitate the dynamic parallelism, but cannot figure out how to resolve the problem.

I added CR3 to the solution as I suspect I’ll need to do the final CUDA compilation in the main application project, but do not know how to proceed with that.

Any pointers would be immensely helpful.

For context, here is the build output from Visual Studio for the App project:

Rebuild started...
1>------ Rebuild All started: Project: CR2, Configuration: Release x64 ------
1>Compiling CUDA source file kernel2.cu...
1>
1>D:\CUDAExample\CR2>"D:\CUDAExample\/CUDA/BuildToolkit\bin\nvcc.exe" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30037\bin\HostX86\x64" -x cu -rdc=true  -ID:\CUDAExample\/CUDA/BuildToolkit\include -ID:\CUDAExample\/CUDA/BuildToolkit\include     --keep-dir x64\Release  -maxrregcount=0  --machine 64 --compile -cudart static    -DWIN32 -DWIN64 -DNDEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /O2 /FdD:\CUDAExample\x64\Release\CR2.pdb /FS   /MD " -o x64\Release\kernel2.cu.obj "D:\CUDAExample\CR2\kernel2.cu"
1>kernel2.cu
1>
1>D:\CUDAExample\CR2>"D:\CUDAExample\/CUDA/BuildToolkit\bin\nvcc.exe" -dlink -o x64\Release\CR2.device-link.obj -Xcompiler "/EHsc /W3 /nologo /O2 /FdD:\CUDAExample\x64\Release\CR2.pdb   /MD "    -gencode=arch=compute_75,code=sm_75  --machine 64 x64\Release\kernel2.cu.obj
1>kernel2.cu.obj
1>CR2.vcxproj -> D:\CUDAExample\x64\Release\CR2.lib
2>------ Rebuild All started: Project: CR1, Configuration: Release x64 ------
2>Compiling CUDA source file kernel1.cu...
2>
2>D:\CUDAExample\CR1>"D:\CUDAExample\/CUDA/BuildToolkit\bin\nvcc.exe" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30037\bin\HostX86\x64" -x cu -rdc=true  -ID:\CUDAExample\/CUDA/BuildToolkit\include -ID:\CUDAExample\/CUDA/BuildToolkit\include     --keep-dir x64\Release  -maxrregcount=0  --machine 64 --compile -cudart none    -DWIN32 -DWIN64 -DNDEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /O2 /FdD:\CUDAExample\x64\Release\CR1.pdb /FS   /MD " -o x64\Release\kernel1.cu.obj "D:\CUDAExample\CR1\kernel1.cu"
2>kernel1.cu
2>
2>D:\CUDAExample\CR1>"D:\CUDAExample\/CUDA/BuildToolkit\bin\nvcc.exe" -dlink -o x64\Release\CR1.device-link.obj -Xcompiler "/EHsc /W3 /nologo /O2 /FdD:\CUDAExample\x64\Release\CR1.pdb   /MD "    -gencode=arch=compute_75,code=sm_75  --machine 64 x64\Release\kernel1.cu.obj
2>kernel1.cu.obj
2>CR1.vcxproj -> D:\CUDAExample\x64\Release\CR1.lib
3>------ Rebuild All started: Project: App, Configuration: Release x64 ------
3>App.cpp
3>Generating code
3>Previous IPDB not found, fall back to full compilation.
3>All 10 functions were compiled because no usable IPDB/IOBJ from previous compilation was found.
3>Finished generating code
3>CR2.lib(CR2.device-link.obj) : error LNK2005: __cudaRegisterLinkedBinary_38_cuda_device_runtime_compute_86_cpp1_ii_8b1a5d37 already defined in CR1.lib(CR1.device-link.obj)
3>D:\CUDAExample\x64\Release\App.exe : fatal error LNK1169: one or more multiply defined symbols found
3>Done building project "App.vcxproj" -- FAILED.
StopOnFirstBuildError: Build cancelled because project "App" failed to build.
Build has been canceled.

Yuki_Ni · August 9, 2021, 7:12am

Hi dear customer ,
Could you please file us a ticket following the instruction here Getting Help with CUDA NVCC Compiler - #3 we will take a further look .Thanks .

Yuki_Ni · August 11, 2021, 5:49am

Library CR1 calls a function from library CR2. That means they need to be linked together (presumably in CR3). But the log shows -dlink calls in both CR1 and CR2 (which means both are doing a device linkm and thus both are brining in the libcudadevrt). those-dlinks need to be removed and instead just do a single dlink in CR3 or APP.

Add a modified project here , note that the CUDA folder is removed to reduce size.CUDA-DDP.7z (4.7 KB)

lucus.vanblaircum · August 11, 2021, 2:56pm

Thank you! This works. I believe I saw another forum post that similarly mentioned this solution but did not follow it.

To summarize, when using dynamic parallelism in multiple projects you can only enable Device Link (CUDA Linker > Perform Device Link = yes) in the “final” project - the main application or library project. This may necessitate adding a dummy .cu file to the final project to trigger CUDA compile. All other settings such as relocatable device code, includes, etc. remain the same (they are not relevant to the aforementioned problem).

I’ve updated the above linked Git repo to have Yuki_Ni’s changes.

johoe · August 12, 2021, 7:27am

Thank you for the solution. Since we faced the same problem yesterday (two static libraries leveraging Dynamic Parallelism) I want to share our solution with CMake.

We ended up adding a static library named cuda_resolve with a dummy.cu linking to those libraries and setting the CMake target property CUDA_RESOLVE_DEVICE_SYMBOLS to ON for cuda_resolve and OFF to the others.

system · October 11, 2021, 7:28am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CUDA 6.0 Dynamic Parallelism: getting error LNK 1169 when linking multiple user static libraries tha CUDA Setup and Installation	2	1230	June 11, 2014
Can't link to my CUDA static library with Dynamic Parallelism: unresolved external symbol __fatbinwrap_38_cuda_ CUDA NVCC Compiler	3	3281	November 10, 2021
Linking errors with CUDA 4.0 but works fine with CUDA 2.2 Problem related to dynamic runtime librari CUDA Programming and Performance	7	3332	July 21, 2011
unhelpful build error CUDA Programming and Performance	10	12797	November 5, 2014
Learning by coding recursive sum using dynamic parallelism CUDA Programming and Performance	2	721	January 17, 2018
CUDA DLL conflict: x86 & x64 CUDA Programming and Performance	18	19976	November 20, 2008
Building Cross-Platform CUDA Applications with CMake Technical Blog	79	4089	October 27, 2021
Build Error MSB3721 When calling object method within kernel, using compiler directives CUDA Programming and Performance	9	5725	November 18, 2015
How to compile the Dynamic Parallelism CUDA code by cmake ? CUDA Programming and Performance	0	1235	February 15, 2017
Invalid device function CUDA Programming and Performance	10	6816	February 25, 2015

Using Dynamic Parallelism in multiple VS2019 projects

Related topics