device functions

marmot · November 9, 2010, 4:11am

Question on using device functions. What is the standard practice when using a device function? I had been writing my functions in files DeviceFun1.cu, DeviceFun2.cu and then ran into the issue at compile time that nvcc wanted all my device functions inlined.

So what should I do? either:

[list=1]

[*]Rename everything global and call from host

[*]Dump all my files above my kernel and deal with it

I don’t like either. Why can’t the compiler just link everything and figure out the inlining during the linking stage?

Specifically I receive the compiler error: “Error: External calls are not supported”…

Any help or pointers on how to develop maintainable cuda kernels that are > 500 lines?

marmot · November 9, 2010, 4:11am

Question on using device functions. What is the standard practice when using a device function? I had been writing my functions in files DeviceFun1.cu, DeviceFun2.cu and then ran into the issue at compile time that nvcc wanted all my device functions inlined.

So what should I do? either:

[list=1]

[*]Rename everything global and call from host

[*]Dump all my files above my kernel and deal with it

I don’t like either. Why can’t the compiler just link everything and figure out the inlining during the linking stage?

Specifically I receive the compiler error: “Error: External calls are not supported”…

Any help or pointers on how to develop maintainable cuda kernels that are > 500 lines?

avidday · November 9, 2010, 5:24am

Because there is no linking stage for device code. nvcc emits ptx (only an intermediate SSA representation of the compiled code) which is intended to either be JIT compiled by the driver or by a standalone assembler to the target device. If you have device function definitions in files external to the compilation, use the preprocessor to include them at compile time.

avidday · November 9, 2010, 5:24am

Because there is no linking stage for device code. nvcc emits ptx (only an intermediate SSA representation of the compiled code) which is intended to either be JIT compiled by the driver or by a standalone assembler to the target device. If you have device function definitions in files external to the compilation, use the preprocessor to include them at compile time.

JaredHoberock · November 9, 2010, 6:14am

Make your device and host device functions inline:

[codebox]

// function.h

#pragma once

inline device void my_function(void);

#include “function.inl”

[/codebox]

[codebox]

// function.inl

inline device void my_function(void)

{

// my_function implementation here

}

[/codebox]

[codebox]

// my_kernel.cu

#include “function.h”

global void my_kernel(…)

{

my_function();

}

[/codebox]

JaredHoberock · November 9, 2010, 6:14am

Make your device and host device functions inline:

[codebox]

// function.h

#pragma once

inline device void my_function(void);

#include “function.inl”

[/codebox]

[codebox]

// function.inl

inline device void my_function(void)

{

// my_function implementation here

}

[/codebox]

[codebox]

// my_kernel.cu

#include “function.h”

global void my_kernel(…)

{

my_function();

}

[/codebox]

marmot · November 10, 2010, 3:40am

Thanks, I was going to do this, however its kind of a pain given the current constraints I have on the Includes for each file - I was just hoping there would be a quick and dirty way. I guess I can try to modify my files this way it should work… How does NVIDIA expect me to create, neat MAINTAINABLE code. I don’t want to sift through 1000-1500 lines of code every time I need to fix a bug…

Question - why name your source file “funciton.inl” - doesn’t matter if its .cu right? I’m going to try .cu and if it doesn’t work I come back and rage…

I’ll get back to you after I get this working…

marmot · November 10, 2010, 3:40am

Thanks, I was going to do this, however its kind of a pain given the current constraints I have on the Includes for each file - I was just hoping there would be a quick and dirty way. I guess I can try to modify my files this way it should work… How does NVIDIA expect me to create, neat MAINTAINABLE code. I don’t want to sift through 1000-1500 lines of code every time I need to fix a bug…

Question - why name your source file “funciton.inl” - doesn’t matter if its .cu right? I’m going to try .cu and if it doesn’t work I come back and rage…

I’ll get back to you after I get this working…

marmot · November 10, 2010, 5:52am

ok so I did this and it worked fine (with the .cu extension). Thanks…

marmot · November 10, 2010, 5:52am

ok so I did this and it worked fine (with the .cu extension). Thanks…

Topic		Replies	Views
Good programming practice in inlining a device function CUDA Programming and Performance	6	2955	October 29, 2021
__device__ CUDA Programming and Performance	7	3833	December 12, 2011
How to split CUDA code CUDA Programming and Performance	2	4830	February 17, 2011
Device Function Library How to make a lib of device functions CUDA Programming and Performance	6	4823	June 24, 2009
ERROR: EXTERNAL CALLS NOT SUPPORTED CUDA Programming and Performance	20	75922	June 24, 2012
NVCC can't inline device code across compilation units - workarounds? feature request? CUDA Programming and Performance	1	817	September 15, 2017
Linking device code CUDA Programming and Performance	13	7272	December 8, 2014
Non-inlined device functions for compute capability 2.0? CUDA Programming and Performance	6	23704	January 21, 2011
__global__/__device__ site architecture CUDA Programming and Performance	6	7006	November 5, 2007
Functions inside __global__ functions question CUDA Programming and Performance	5	7148	March 14, 2012

__device__ functions

Related topics

device functions