__device__ functions

Question on using device functions. What is the standard practice when using a device function? I had been writing my functions in files DeviceFun1.cu, DeviceFun2.cu and then ran into the issue at compile time that nvcc wanted all my device functions inlined.

So what should I do? either:

[list=1]

Rename everything global and call from host

Dump all my files above my kernel and deal with it

I don’t like either. Why can’t the compiler just link everything and figure out the inlining during the linking stage?

Specifically I receive the compiler error: “Error: External calls are not supported”…

Any help or pointers on how to develop maintainable cuda kernels that are > 500 lines?

Question on using device functions. What is the standard practice when using a device function? I had been writing my functions in files DeviceFun1.cu, DeviceFun2.cu and then ran into the issue at compile time that nvcc wanted all my device functions inlined.

So what should I do? either:

[list=1]

Rename everything global and call from host

Dump all my files above my kernel and deal with it

I don’t like either. Why can’t the compiler just link everything and figure out the inlining during the linking stage?

Specifically I receive the compiler error: “Error: External calls are not supported”…

Any help or pointers on how to develop maintainable cuda kernels that are > 500 lines?

Because there is no linking stage for device code. nvcc emits ptx (only an intermediate SSA representation of the compiled code) which is intended to either be JIT compiled by the driver or by a standalone assembler to the target device. If you have device function definitions in files external to the compilation, use the preprocessor to #include them at compile time.

Because there is no linking stage for device code. nvcc emits ptx (only an intermediate SSA representation of the compiled code) which is intended to either be JIT compiled by the driver or by a standalone assembler to the target device. If you have device function definitions in files external to the compilation, use the preprocessor to #include them at compile time.

Make your device and host device functions inline:

[codebox]

// function.h

#pragma once

inline device void my_function(void);

#include “function.inl”

[/codebox]

[codebox]

// function.inl

inline device void my_function(void)

{

// my_function implementation here

}

[/codebox]

[codebox]

// my_kernel.cu

#include “function.h”

global void my_kernel(…)

{

my_function();

}

[/codebox]

Make your device and host device functions inline:

[codebox]

// function.h

#pragma once

inline device void my_function(void);

#include “function.inl”

[/codebox]

[codebox]

// function.inl

inline device void my_function(void)

{

// my_function implementation here

}

[/codebox]

[codebox]

// my_kernel.cu

#include “function.h”

global void my_kernel(…)

{

my_function();

}

[/codebox]

Thanks, I was going to do this, however its kind of a pain given the current constraints I have on the Includes for each file - I was just hoping there would be a quick and dirty way. I guess I can try to modify my files this way it should work… How does NVIDIA expect me to create, neat MAINTAINABLE code. I don’t want to sift through 1000-1500 lines of code every time I need to fix a bug…

Question - why name your source file “funciton.inl” - doesn’t matter if its .cu right? I’m going to try .cu and if it doesn’t work I come back and rage…

I’ll get back to you after I get this working…

Thanks, I was going to do this, however its kind of a pain given the current constraints I have on the Includes for each file - I was just hoping there would be a quick and dirty way. I guess I can try to modify my files this way it should work… How does NVIDIA expect me to create, neat MAINTAINABLE code. I don’t want to sift through 1000-1500 lines of code every time I need to fix a bug…

Question - why name your source file “funciton.inl” - doesn’t matter if its .cu right? I’m going to try .cu and if it doesn’t work I come back and rage…

I’ll get back to you after I get this working…

ok so I did this and it worked fine (with the .cu extension). Thanks…

ok so I did this and it worked fine (with the .cu extension). Thanks…