Subsitute for passing functions by address

PTThompson · July 13, 2009, 10:59pm

I’m trying to convert the lsoda ODE solver to cuda, and I’ve run into a problem. The function call to lsoda expects, as one of its parameters, the address of the function, fex(…), which does some calculations specific to the particular ODE that you’re solving. If you want to use lsoda for a different ODE, you pass it a different function (along with the appropriate other stuff). Now, device functions can’t be referenced by address, (and I can’t run host functions on the device, I think), so that leaves me having to define fex(…) explicitly inside the lsoda.cu file. This will work, I think, for our program, but it doesn’t let us use different ODEs to the solver in the same code. I’d have to change the function fex(…), then recompile every time I want to use a different ODE. Also it’s just not portable. Is there a way around this?

Thanks,

Paul

SPWorley · July 13, 2009, 11:50pm

The way I solve this common problem is with templates. The operation is defined as a functor class, and the kernel is templatized based on that functor. When you change that core evaluation function, you automatically get a re-optimized kernel for that new behavior.

CUDA has no problems using functor objects even inside kernels. Used as a stateless functor, the templates just work, and I even have nested functors for versatility (one functor is a template itself with its own subfunction).

If I remember correctly, Mark Harris uses this trick in a lot of his code. I think the radix sort has an optional floating-point preprocess step defined by a functor… to sort integers the functor does nothing, to sort floats, the value is transformed into a sortable pattern first, and restored later.

PTThompson · July 14, 2009, 2:11am

The way I solve this common problem is with templates. The operation is defined as a functor class, and the kernel is templatized based on that functor. When you change that core evaluation function, you automatically get a re-optimized kernel for that new behavior.

CUDA has no problems using functor objects even inside kernels. Used as a stateless functor, the templates just work, and I even have nested functors for versatility (one functor is a template itself with its own subfunction).

If I remember correctly, Mark Harris uses this trick in a lot of his code. I think the radix sort has an optional floating-point preprocess step defined by a functor… to sort integers the functor does nothing, to sort floats, the value is transformed into a sortable pattern first, and restored later.

It looks like functors and templates are C++ only, not C. Is this correct? By extension, that implies that kernels can contain some C++ code?

tmurray · July 14, 2009, 2:17am

Templates are the only officially supported C++ feature available from kernels, although more than just templates are known to work.

seibert · July 14, 2009, 2:59am

Is operator overloading still considered unsupported?

tmurray · July 14, 2009, 3:02am

I think so, but I also don’t know of any circumstances where it doesn’t work.

PTThompson · July 14, 2009, 7:50pm

Ok, so I built this code below, and it seems to be working properly. Is this what you meant by using templates and functors? Also, is this the best way to do it, or is there a cleaner, more preferred way? I tried doing it by having a base struct that the two structs inherited from, and then passsing that base struct to my kernel, but it didn’t like that very much. Nor did it like me throwing virtuals tags at it (so I guess pg 16 of the programming guide wasn’t lying on that account).

Thanks,

Paul

[codebox]/*

functor2.cu
functor2

*/

#include <stdio.h>

device struct myFex {

__device__ void operator()(int *bob, float *joe, int art, float big) {

*bob += art;

*joe += big;

}

};

device struct myOtherFex{

__device__ void operator()(int *bob, float *joe, int art, float big) {

*bob -= art;

*joe -= big;

}

};

//}

template

global void Bill(int* A, float* B, int C, float D, Fex fex){

fex(A,B,C,D);

}

int main (int argc, char * const argv) {

// insert code here...

printf("Hello World\n");

int *A, C;

float *B, D;

cudaMalloc((void**)&A,sizeof(int));

cudaMalloc((void**)&B,sizeof(float));

int* Ah;

float* Bh;

cudaMallocHost((void**)&Ah,sizeof(int));

cudaMallocHost((void**)&Bh,sizeof(float));

printf("%i\t%f\t%i\t%f\n",*Ah,*Bh,C,D);

cudaMemcpy(Ah,A,sizeof(int),cudaMemcpyDeviceToHost);

cudaMemcpy(Bh,B,sizeof(float),cudaMemcpyDeviceToHost);

printf("%i\t%f\t%i\t%f\n",*Ah,*Bh,C,D);

C = 5;

D= 7.2;

myOtherFex bogoFex;

for(int itr = 0; itr <20; itr++){

Bill<<<1,1>>>(A,B,C,D,bogoFex);

cudaMemcpy(Ah,A,sizeof(int),cudaMemcpyDeviceToHost);

cudaMemcpy(Bh,B,sizeof(float),cudaMemcpyDeviceToHost);

printf("%i\t%f\t%i\t%f\n",*Ah,*Bh,C,D);

}

myFex bogoFex2;

for(int itr = 0; itr <20; itr++){

Bill<<<1,1>>>(A,B,C,D,bogoFex2);

cudaMemcpy(Ah,A,sizeof(int),cudaMemcpyDeviceToHost);

cudaMemcpy(Bh,B,sizeof(float),cudaMemcpyDeviceToHost);

printf("%i\t%f\n",*Ah,*Bh);

}

cudaFree(A);

cudaFree(B);

cudaFreeHost(Ah);

cudaFreeHost(Bh);

return 0;

}[/codebox]

SPWorley · July 15, 2009, 12:39am

Yep, that’s the general spirit of how to do it.

You can use inheritance in CUDA, but not virtual inheritance. Think of it this way, if a C++ feature uses pointers, dynamic allocation, or exceptions, it’s not going to work. The simpler “C with Classes” works, including operator overloading. Templates also work.

As for your usage of functors, you can clean things up a lot by not passing (information-less) empty functor arguments.

There’s no need since there’s no state. This makes code easier. The one thing you do get is an extra pair of annoying parenthesis but you get used to it.

template<typename Fex>

__global__ void Bill(int* A, float* B, int C, float D){  // note no need to pass any dummy Fex argument

Fex()(A,B,C,D);

}

The compiler of course inlines your desired function and no dummy struct is actually ever created or anything.

PTThompson · July 15, 2009, 3:06am

Yep, that’s the general spirit of how to do it.

You can use inheritance in CUDA, but not virtual inheritance. Think of it this way, if a C++ feature uses pointers, dynamic allocation, or exceptions, it’s not going to work. The simpler “C with Classes” works, including operator overloading. Templates also work.

As for your usage of functors, you can clean things up a lot by not passing (information-less) empty functor arguments.

There’s no need since there’s no state. This makes code easier. The one thing you do get is an extra pair of annoying parenthesis but you get used to it.
template<typename Fex>

__global__ void Bill(int* A, float* B, int C, float D){  // note no need to pass any dummy Fex argument

Fex()(A,B,C,D);

}
The compiler of course inlines your desired function and no dummy struct is actually ever created or anything.

I think I see your point. I’d have to then call Bill as ‘Bill<<<blocksInGrid,threadsPerBlock>>>(A,B,C,D)’ from my main, is that correct?

SPWorley · July 15, 2009, 4:02am

Yep, exactly right.

Functors can be used on different levels, too, including on device subfunctions and such. Just templatize them too and pass in the parent template types.

The operator() is sometimes nice for one function, but is actually optional. You can instead generalize the functors and just give them static member functions, allowing you make multiple functions in a related bundle…

So you’d have

myKernel<<A, B, C, D>

then define a type like:

struct unsortedFloatListProcessor {

static void setup(…) { …}

static int swap(…) {…}

};

and inside you’d call them with evaluations like

dataType::setup(…);

dataType::swap(…);

dataType::cleanup(…);

and such.

Since they’re static functions you don’t need to do the temporary creation extra set of parenthesis.

PTThompson · July 27, 2009, 6:35pm

Ok, so I did the functor thing, and it all works great under emulation mode, but nvcc complains about accessing host functions from the device when I’m in release mode. I tried doing the obvious and playing with pointers, but it was messy and didn’t work. Here’s a sample of how I’ve been using the functors and templates:

[codebox]

#include <stdio.h>

struct MyBob

{

void operator()(int j)

{

	;// do nothing

}

};

template

global void bob(Fex f)

{

f(0);

}

int main (int argc, char * const argv) {

// insert code here...

printf(“Hello\n”);

MyBob bubba;

bob<<<1,1>>>(bubba);

return 0;

}[/codebox]

So how do I fix this so that it will compile and run without emu mode?

Thanks much,

Paul

_DK · August 26, 2009, 4:07pm

Templates are cool but is there any real substitute for function pointers when the actual function being called is not known at compile time? Something like virtual functions.
I know we can make a big switch statement, but my functions are rather small and I expect conditional branching to be an overkill (there may be a few dozens functions).

Could you recommend something? I will also need some kind of expression tree in the future and having switch everywhere is obviously a bad idea.

Thank you!

baarts · August 26, 2009, 4:20pm

Paul,

declaring the operator with host device should do the trick

Topic		Replies	Views
Device function pointers: Is it possible to use them in a useful way? CUDA Programming and Performance	16	9214	May 20, 2020
function pointers CUDA Programming and Performance	8	10792	June 15, 2007
Using a Functor in CUDA Kernel CUDA Programming and Performance	1	305	September 6, 2024
Is it possible to pass stateful functions to kernel? CUDA Programming and Performance	4	286	October 14, 2024
Warp Invalid PC, device function pointer CUDA Programming and Performance	4	1118	May 29, 2019
CUDA kernel is slow with function pointers CUDA Programming and Performance cuda	12	2711	October 12, 2021
Template function pointer CUDA Programming and Performance	2	4748	October 14, 2011
How can I use __device__ function pointer in CUDA ? CUDA Programming and Performance	34	60755	June 3, 2020
__device__ function clarifications CUDA Programming and Performance	6	21619	December 10, 2008
Kernel functions as parameters? CUDA Programming and Performance	5	5372	February 11, 2011

Subsitute for passing functions by address

Related topics