Compile CUDA in MSVS with device code in different compilation units

I am trying to split my code in multiple compilation units. But when I have a device to device call, the compiler exits with code 255.

As a sample I start with a new CUDA projects which gives me some sample code to add a matrix.

I then separated the calls to the device in a new compilation unit ‘kernel2.cu’ and added a header file with the function declarations:

#include "kernel2.cuh"
#include "device_launch_parameters.h"

__global__ void addKernel(int* c, const int* a, const int* b)
{
    addKernelDevice(c, a, b);
}

__device__ void addKernelDevice(int* c, const int* a, const int* b)
{
    int i = threadIdx.x;
    c[i] = a[i] + b[i];
}

This compiles and runs fine, because the device call resides in the same compilation unit.

I would like to move the ‘addKernelDevice’ function in a separate unit now.

This gives me the following error:
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA 11.8.targets(785,9): error MSB3721: The command ““C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\nvcc.exe” -gencode=arch=compute_52,code="sm_52,compute_52" --use-local-env -ccbin “C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.33.31629\bin\HostX64\x64” -x cu -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include” -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include" -G --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static -g -DWIN32 -DWIN64 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler “/EHsc /W3 /nologo /Od /Fdx64\Debug\vc143.pdb /FS /Zi /RTC1 /MDd " -o C:\Development\C++\Cuda\CudaTestFileSplitDeviceV2\x64\Debug\kernel.cu.obj “C:\Development\C++\Cuda\CudaTestFileSplitDeviceV2\kernel.cu”” exited with code 255.

Looking at the post https://developer.nvidia.com/blog/separate-compilation-linking-cuda-device-code/ it should be possible.

So I added the -dc option to the compiler, which then results in a linker error:

C:\Development\C++\Cuda\CudaTestFileSplitDeviceV2>“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\nvcc.exe” -gencode=arch=compute_52,code="sm_52,compute_52" --use-local-env -ccbin “C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.33.31629\bin\HostX64\x64” -x cu -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include" -G --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static -dc -g -DWIN32 -DWIN64 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Fdx64\Debug\vc143.pdb /FS /Zi /RTC1 /MDd " -o C:\Development\C++\Cuda\CudaTestFileSplitDeviceV2\x64\Debug\kernel.cu.obj “C:\Development\C++\Cuda\CudaTestFileSplitDeviceV2\kernel.cu”
kernel2.cu
kernel2.cuh
kernel.cu
Creating library C:\Development\C++\Cuda\CudaTestFileSplitDeviceV2\x64\Debug\CudaTestFileSplitDeviceV2.lib and object C:\Development\C++\Cuda\CudaTestFileSplitDeviceV2\x64\Debug\CudaTestFileSplitDeviceV2.exp
LINK : warning LNK4098: defaultlib ‘LIBCMT’ conflicts with use of other libs; use /NODEFAULTLIB:library
kernel.cu.obj : error LNK2019: unresolved external symbol __cudaRegisterLinkedBinary_68a2949d_9_kernel_cu_f853efa9 referenced in function “void __cdecl __nv_cudaEntityRegisterCallback(void * *)” (?__nv_cudaEntityRegisterCallback@@YAXPEAPEAX@Z)
kernel2.cu.obj : error LNK2019: unresolved external symbol __cudaRegisterLinkedBinary_02ae6430_10_kernel2_cu_1d18744d referenced in function “void __cdecl __nv_cudaEntityRegisterCallback(void * *)” (?__nv_cudaEntityRegisterCallback@@YAXPEAPEAX@Z)
kernel2.cuh.obj : error LNK2019: unresolved external symbol __cudaRegisterLinkedBinary_b7b0982f_11_kernel2_cuh_c2d33c09_24776 referenced in function “void __cdecl __nv_cudaEntityRegisterCallback(void * *)” (?__nv_cudaEntityRegisterCallback@@YAXPEAPEAX@Z)
C:\Development\C++\Cuda\CudaTestFileSplitDeviceV2\x64\Debug\CudaTestFileSplitDeviceV2.exe : fatal error LNK1120: 3 unresolved externals

I also tried to add ‘/NODEFAULTLIB:library’ to the compiler, but that gives me other errors.

Any idea what to set, so that it does compile?

I’m going to move this to the NVCC forum. I think they may have more insights into what is going on.