device

mdowns · December 11, 2011, 5:09pm

I’m using Cuda 4.0 and I’m having trouble using device functions the way that I need to use them. I was hired to speedup the processing of part of a large system written in ‘C’. There are hundreds of file, but the functions that I need are probably in 12-15 of these files. The files have .h and .c pairs (of course, this is standard in any large system). Now, I was thinking that if I add device in front of the prototype in the .h and in front of the function implementation in the .c (and I rename it to .cu), then I should be able to call them from my global kernel function. I was having a lot of trouble getting this to link. So I wrote an extremely simple vector add application in Windows 7 and Cuda 4.0. Where normally, you would just c[i] = a[i] + b[i];, I implemented a simple add function with a prototype in cudaUtils.h and implementation in cudaUtils.cu with the device, etc. So now, my kernel function looks like c[i] = cudaUtils_plus (a[i], b[i]); just to test out the file structure style.

The compiler/linker tells me that:
error : External calls are not supported (found non-inlined call to _Z14cudaUtils_plusii)

Now, if I #include “cudaUtils.cu” instead of “cudaUtils.h”, it links and runs correctly. Or, if I move the implementation from the .cu to the .h (even without the ‘inline’ directive it also works)

I don’t think that this is practical when taking selected functions from a dozen files where each .cu is going to include 1 or more other .cu files. I’m sure that I would end up with multipli-defined complaints.

Is this truly a limitation of the CUDA compliation system that you can’t split up an implementation of a kernel program across multiple files? I looked through the examples in the SDK, and I couldn’t find one that uses the .h and .cu method that I need. Some have a .cuh file, but typically they just have a typedef struct that will be used in the kernel, but no device function prototypes.

Any help here?

Thanks,
Mike

tera · December 11, 2011, 7:10pm

Yes. CUDA does not have a linker on the device side. You need to include all code into a single compilation unit.

mdowns · December 11, 2011, 11:38pm

I don’t think that a dynamic linker on the device is the issue. I’m not trying to produce .o’s or .a’s or .lib’s for the device to take care of. It looks to me like the only thing that works is if all CUDA device code is textually within 1 source file either by writing it that way or by including the source functions in an included .h or included .cu file.

Is this what you mean? That all CUDA device code has to be basically in a single source file?

tera · December 12, 2011, 2:01am

Yes. I’m not particularly referring to a dynamic linker - CUDA doesn’t have a static linker for device code either.

mdowns · December 12, 2011, 2:30pm

So your entire CUDA kernel has to reside in a single .o?

tera · December 12, 2011, 3:42pm

Yes.

jamesqf · December 12, 2011, 5:27pm

Err… Then how can NVidia supply all those libraries of CUDA functions?

And wouldn’t it be really nice if the NVidia developers got to work and implemented one of the most elementary parts of a software management package?

tera · December 12, 2011, 11:17pm

I’ve not seen Nvidia provide any library of device functions without source code.

The second question I cannot really answer. So far it doesn’t seem to be an important omission to me (although the question pops up regularly on the forums).
CUDA kernels don’t usually get that large that you’d need to split them into different compilation units. And at the present state of the CUDA ecosystem, it seems difficult to me to provide nontrivial “black box” device functions that one could just link into ones kernels and get close to optimal performance. But with the improvements in each generation of devices that bring the GPU programming model and performance characteristics closer to CPUs Nvidia seems to be heading there. And at some time in the future we’ll probably get a device side linker as well.

Topic		Replies	Views
Working with Multiple Files in CUDA Separating Device Functions in different files CUDA Programming and Performance	1	6988	May 11, 2011
How to re-structure code for CUDA (.cu, .cuh, .c)? CUDA Programming and Performance	4	5167	August 19, 2009
Device Function Library How to make a lib of device functions CUDA Programming and Performance	6	4851	June 24, 2009
__device__ functions CUDA Programming and Performance	9	3113	November 10, 2010
Non-inlined device functions for compute capability 2.0? CUDA Programming and Performance	6	23704	January 21, 2011
Using a __device__ function pointer. Problems using a pointer to a __device__ function. CUDA Programming and Performance	2	721	June 15, 2012
Linking device code CUDA Programming and Performance	13	7327	December 8, 2014
NVCC can't inline device code across compilation units - workarounds? feature request? CUDA Programming and Performance	1	818	September 15, 2017
How to split CUDA code CUDA Programming and Performance	2	4838	February 17, 2011
How to separate device function and kernel function? CUDA Programming and Performance	2	1546	November 22, 2009

__device__

Related topics

device