How to re-structure code for CUDA (.cu, .cuh, .c)?

mgibbons · April 30, 2009, 9:37am

I’ve ported and tested a chunk of code from C++ to C as a precursor to running it on CUDA but I’m now having problem getting it to compile.

An example would best illustrate the problem:
[indent]
I have my kernel defined in the file cudaKernel.cu.
This calls a function foo() which is declared and defined in foo.h and foo.c.
It compiles and runs fine in emulation mode but I get ‘External calls are not supported’ errors when I compile in ‘device’ mode.
[/indent]

BTW - this test function has almost nothing in it for test purposed i.e. int x = 1 * 2.

I’m expecting the compiler to inline foo() with no problems but it’s beginning to look like I need to have all the functions which the kernel function depends on defined in the same file as the kernel function. Is that right?

Maybe I’m missing something but this seems nuts to me and is a big obstacle to being able to run code on CUDA or CPU using a single code line. Ditto the need to add device prefixes. Why is this necessary? Can’t the compiler figure this out itself?

TIA
Mark

gatoatigrado · May 1, 2009, 9:35am

You’re probably not going to get any benefit trying to run CPU code on CUDA. You need to parallelize your algorithm. device is pretty minimal and clean imho. You can use device and host together.

mgibbons · May 1, 2009, 9:46am

Thanks for the reply but it didn’t really answer my question.

To your points though:

parallelising your algorithm is not the only way to get a performance speedup: you can also run your algorithm in parallel e.g. if I have 10000 options to price I can price 240 simultaneously without any parallelisation of the pricing algorithm;
for the moment I’ve hacked all the code into one file and currently see a speedup of around 16 over the CPU - that’s without any tuning and using an AoS data layout which will be resulting in horrible memory accesses.

best regards

Mark

Simon_Green · May 1, 2009, 10:04am

The CUDA compiler doesn’t currently support linking device code, so all the device functions used by each kernel must be in a single file. You can get around this to a certain extent by putting common functions into header files and including them in your .cu file.

lebsack · August 19, 2009, 5:49pm

Are there plans to provide linking of device code in the future?

Topic		Replies	Views
__device__ CUDA Programming and Performance	7	3837	December 12, 2011
How to separate device function and kernel function? CUDA Programming and Performance	2	1533	November 22, 2009
Should I list kernels in CUDA unit header files? CUDA Programming and Performance	3	1412	August 9, 2022
calling a __device__ functions inside kernels CUDA Programming and Performance	4	20596	August 16, 2013
Is it possible to compile CUDA kernels in a .cu file that are directly callable by multiple different applications after statically linking? CUDA Programming and Performance	2	319	January 21, 2024
Device Function Library How to make a lib of device functions CUDA Programming and Performance	6	4841	June 24, 2009
a kernel call within another kernel CUDA Programming and Performance	16	11616	January 23, 2018
Nesting kernels Can I do this in CUDA? CUDA Programming and Performance	11	12578	January 4, 2010
Using a function both in cpp & device code CUDA Programming and Performance	5	2243	May 23, 2010
What about calling non __device__ function inside kernel? Feature suggestion CUDA Programming and Performance	1	7828	June 3, 2011

How to re-structure code for CUDA (.cu, .cuh, .c)?

Related topics