Multiple definition error on device function in header file

ashwin · March 15, 2011, 1:50am

I have a device function that is defined in a header file. The reason it is in a header file is because it is used by a global template kernel, which needs to be in a header file since it is a template kernel.

When this header file is included across 2 or more .cu files, I get a LNK2005 error during linking:

[indent] [/indent] I checked the generated .ptx file and found that fooKernel does not even call getCurThreadIdx, it has been inlined! If this is the case, why is there a multiple definition error? External Image

I can make it go away if I add a forceinline qualifier to the device function. But, my doubt about the normal inlining of device function here still remains.

Another reason I am concerned is that this means that the device functions used by a template kernel need to be always forceinline! This severely limits the amount of functionality that can be achieved inside a template kernel, especially on the Fermi architecture! External Image

Here is sample code to produce the above error on CUDA 3.2 and Visual Studio 2008:

FooDevice.h:

#ifndef FOO_DEVICE_H

#define FOO_DEVICE_H

__device__ int getCurThreadIdx()

{ return ( ( blockIdx.x * blockDim.x ) + threadIdx.x ); }

template< typename T >

__global__ void fooKernel( const T* inArr, int num, T* outArr )

{

	const int threadNum = ( gridDim.x * blockDim.x );

	for ( int idx = getCurThreadIdx(); idx < num; idx += threadNum )

 	outArr[ idx ] = inArr[ idx ];

	return;

}

__global__ void fooKernel2( const int* inArr, int num, int* outArr );

#endif // FOO_DEVICE_H

FooDevice.cu:

#include "FooDevice.h"

// One other kernel that uses getCurThreadIdx()

__global__ void fooKernel2( const int* inArr, int num, int* outArr )

{

	const int threadNum = ( gridDim.x * blockDim.x );

	for ( int idx = getCurThreadIdx(); idx < num; idx += threadNum )

 	outArr[ idx ] = inArr[ idx ];

	return;

}

Main.cu:

#include "FooDevice.h"

int main()

{

	int num = 10;

	int* dInArr = NULL;

	int* dOutArr = NULL;

	const int arrSize = num * sizeof( *dInArr );

	cudaMalloc( &dInArr, arrSize );

	cudaMalloc( &dOutArr, arrSize );

	// Using template kernel

	fooKernel<<< 10, 10 >>>( dInArr, num, dOutArr );

	return 0;

}

avidday · March 15, 2011, 6:20am

Remember that all CUDA definitions must be compiled in the same logical compilation unit. You are not doing that and that is why there are duplicate symbols errors. Don’t compile FooDevice.cu. Ever. Import it into main.cu. That is the underlying source of the error.

As for why you see the error on device functions, it is because they are both inlined and compiled seperately for sm20. The seperate version is for providing function pointer support, which is why they get named as symbols in the resulting cubin elf output, just like textures, and constant and global memory declarations.

ashwin · March 15, 2011, 9:57am

Thanks for that reply! External Image

Do you mean that all CUDA code should end up being included in the same compilation unit in a program? Could you point me to some CUDA documentation that explains this? I could not find any information about this.

Also, won’t this make CUDA code management complicated and result in long compilation times since code cannot be broken up into compilation units?

Could you explain what is this function pointer support and why it is needed? Also, why is the compiler still keeping this separate version around even though (1) it can clearly see that the separate version is not called anywhere in a compilation unit (2) device functions cannot be called from another compilation unit anyway?

avidday · March 15, 2011, 10:28am

To answer the first part, yes it is something of a limitation, but without a linker for device code there is no alternative. The switch to an ELF fat binary format in CUDA 3 might be the first step towards a device code linker, but that brings a lot of other issues that are beyond the scope of your question. As for complexity, there are some extremely well designed and large CUDA template libraries (like thrust) which show how cleanly and elegantly templates can be used in CUDA if you know what you are doing.

To answer the second question, function pointers are a pretty fundamental feature of C and C++. The whole C++ class model is predicated on function pointers. To get some idea of how they can be used, look at the SDK example in recent toolkit releases. The standalone device functions are present in compiled code for C++ and function pointer support. The fact they are there should not matter to properly designed and compiled code.

ashwin · March 24, 2011, 2:06am

I have code where the kernel call and kernel definition are in separate compilation units and is working fine. Do you mean to say that these should be combined into the same compilation unit?

Here is an example of what is working fine for me:

//////////////////////////////////////////////

// Host.cu

#include "Kernel.h"

void doSomeCuda()

{

	doSomeCudaKernel<<< x, y >>>( inArray, outArray );

}

//////////////////////////////////////////////

// Kernel.h

__global__ void doSomeCudaKernel( const int*, int* );

//////////////////////////////////////////////

// Kernel.cu

__global__ void doSomeCudaKernel( const int* inArray, int* outArray)

{

	// Do some kernel work

}

//////////////////////////////////////////////

Topic		Replies	Views
Included file with reusable device functions creating link errors in debug for two odd cases CUDA Programming and Performance	1	1025	May 13, 2013
when .cu file is included multiple times CUDA Programming and Performance	2	2050	May 2, 2008
How to split CUDA code CUDA Programming and Performance	2	4914	February 17, 2011
VS2008, link errors, already defined CUDA Programming and Performance	7	2291	January 19, 2011
Errors introduced by header file "device_functions.h" CUDA Programming and Performance cuda	8	1225	July 28, 2023
multiple cu files VS2008 CUDA Programming and Performance	5	959	November 11, 2010
Device Function Library How to make a lib of device functions CUDA Programming and Performance	6	5006	June 24, 2009
something wrong with my kernel_func CUDA Programming and Performance	1	1763	May 9, 2008
Linking errors with CUDA 4.0 but works fine with CUDA 2.2 Problem related to dynamic runtime librari CUDA Programming and Performance	7	3408	July 21, 2011
Calling device function returns "calling a host function..." error CUDA Programming and Performance	6	6592	August 23, 2010

Multiple definition error on device function in header file

Related topics