Separate compilation of CUDA code into library, for use with existing code base

I plan to write a new CUDA module for use in an existing code base. Because I don’t want to mess with my existig build system, I don’t want the my existing code to have an explicit CUDA dependency. Therefore I plan to build the CUDA modfules as a library and with C-linkage (as NVCC depends on an older version of GCC than my existing code base uses) as follows:

[main.cpp – representative of existing code]

extern "C" void runKernel();

int main(int argc, char **argv)
{
	runKernel();
}

[Test.cu – representative of CUDA modules]

__global__ void testKernel()
{
}

extern "C" void runKernel()
{
	testKernel<<<1,1>>>();
}

Following the tutorial Separate Compilation and Linking of CUDA C++ Device Code I’ve added the -dlink flag (I understand) in order to generate device code and the -lcuart flag in order to link using GCC (recall that I would like to link to the CUDA library to my existing code without introducing CUDA dependencies i.e. NVCC.

Build steps

nvcc -ccbin g++ -m64 -gencode arch=compute_30,code=sm_30 -dlink -dc -o MWE.o -c MWE.cu
nvcc -ccbin g++ -m64 -gencode arch=compute_30,code=sm_30 -dlink -o MWE.a MWE.o
g++ -m64 -o main.o -c main.cpp
g++ -m64 -lcudart MWE.a main.o -o test

Build errors

MWE.a: In function `__cudaUnregisterBinaryUtil':
link.stub:(.text+0xf): undefined reference to `__cudaUnregisterFatBinary'
MWE.a: In function `__cudaRegisterLinkedBinary_38_tmpxft_000019d4_00000000_7_MWE_cpp1_ii_d6d5cc43':
link.stub:(.text+0x5a): undefined reference to `__fatbinwrap_38_tmpxft_000019d4_00000000_7_MWE_cpp1_ii_d6d5cc43'
MWE.a: In function `__cudaRegisterLinkedBinary(__fatBinC_Wrapper_t const*, void (*)(void**), void*)':
link.stub:(.text+0x102): undefined reference to `__cudaRegisterFatBinary'
main.o: In function `main':
main.cpp:(.text+0x10): undefined reference to `runKernel'
collect2: error: ld returned 1 exit status
1 Like

if i correctly understood you, you try to link together objfiles build by different, incompatible compilers. afaik, that’s impossible - executable should contain only one run-time library. the alternative is to build DLL file with one compiler and then call it from executable build by another compiler - each DLL file has its own run-time library. i’m a windows user, though, situation may be different on Linux

is the -ccbin g++ necessary? Usually nvcc picks the g++ version that it officially supports in the respective CUDA toolkit. By overriding this with -ccbin you’re making it use a possibly incompatible version (the system default compiler).

he said “NVCC depends on an older version of GCC than my existing code base uses” so it seems that his code is incompatible with nvcc-compatible version of gcc

My recommendation would still be to try this without the -ccbin g++

You’ll mix object code generated by different compiler versions. But there is a chance it will work, as the interface between both modules only uses plain C syntax.

Thanks for your feedback. My understanding of the error messages is that suggest undefined functions rather than incompatible ones.

the purpose of this step is to create a static library out of one (or multiple) .o files?

nvcc -ccbin g++ -m64 -gencode arch=compute_30,code=sm_30 -dlink -o MWE.a MWE.o

Have you tried to use the ar tool instead?

ar rcs MWE.a MWE.o

Regarding the ar options: r means to insert with replacement, c means to create a new archive, and s means to write an index.

It almost looks like you might need to link to the cuda runtimes, cudadevrt and cudart.

Leaving aside the question of mixing gnu toolchains, there are several changes that need to be made to the build sequence to get it to work:

#ordinary rdc compilation of CUDA source
nvcc -ccbin g++ -m64 -arch=sm_30 -dc -o MWE.o -c MWE.cu
#separate device link step, necessary for rdc flow
nvcc -ccbin g++ -m64 -arch=sm_30 -dlink -o MWE.dlink.o MWE.o
#creation of library - note we need ordinary linkable object and device-link object!
nvcc -ccbin g++ -m64 -arch=sm_30 -lib -o MWE.a MWE.o MWE.dlink.o
#host code compilation
g++ -m64 -o main.o -c main.cpp
#host (final) link phase - the order of entries on this line is important!!
g++ -m64 main.o MWE.a -o test -L/usr/local/cuda/lib64 -lcudart

Thanks txbob!