Subsitute for passing functions by address

I’m trying to convert the lsoda ODE solver to cuda, and I’ve run into a problem. The function call to lsoda expects, as one of its parameters, the address of the function, fex(…), which does some calculations specific to the particular ODE that you’re solving. If you want to use lsoda for a different ODE, you pass it a different function (along with the appropriate other stuff). Now, device functions can’t be referenced by address, (and I can’t run host functions on the device, I think), so that leaves me having to define fex(…) explicitly inside the lsoda.cu file. This will work, I think, for our program, but it doesn’t let us use different ODEs to the solver in the same code. I’d have to change the function fex(…), then recompile every time I want to use a different ODE. Also it’s just not portable. Is there a way around this?

Thanks,

Paul

The way I solve this common problem is with templates. The operation is defined as a functor class, and the kernel is templatized based on that functor. When you change that core evaluation function, you automatically get a re-optimized kernel for that new behavior.

CUDA has no problems using functor objects even inside kernels. Used as a stateless functor, the templates just work, and I even have nested functors for versatility (one functor is a template itself with its own subfunction).

If I remember correctly, Mark Harris uses this trick in a lot of his code. I think the radix sort has an optional floating-point preprocess step defined by a functor… to sort integers the functor does nothing, to sort floats, the value is transformed into a sortable pattern first, and restored later.

It looks like functors and templates are C++ only, not C. Is this correct? By extension, that implies that kernels can contain some C++ code?

Templates are the only officially supported C++ feature available from kernels, although more than just templates are known to work.

Is operator overloading still considered unsupported?

I think so, but I also don’t know of any circumstances where it doesn’t work.

Ok, so I built this code below, and it seems to be working properly. Is this what you meant by using templates and functors? Also, is this the best way to do it, or is there a cleaner, more preferred way? I tried doing it by having a base struct that the two structs inherited from, and then passsing that base struct to my kernel, but it didn’t like that very much. Nor did it like me throwing virtuals tags at it (so I guess pg 16 of the programming guide wasn’t lying on that account).

Thanks,

Paul

[codebox]/*

  • functor2.cu

  • functor2

*/

#include <stdio.h>

device struct myFex {

__device__ void operator()(int *bob, float *joe, int art, float big) {

*bob += art;

*joe += big;

}

};

device struct myOtherFex{

__device__ void operator()(int *bob, float *joe, int art, float big) {

*bob -= art;

*joe -= big;

}

};

//}

template

global void Bill(int* A, float* B, int C, float D, Fex fex){

fex(A,B,C,D);

}

int main (int argc, char * const argv) {

// insert code here...

printf("Hello World\n");

int *A, C;

float *B, D;

cudaMalloc((void**)&A,sizeof(int));

cudaMalloc((void**)&B,sizeof(float));

int* Ah;

float* Bh;

cudaMallocHost((void**)&Ah,sizeof(int));

cudaMallocHost((void**)&Bh,sizeof(float));

printf("%i\t%f\t%i\t%f\n",*Ah,*Bh,C,D);

cudaMemcpy(Ah,A,sizeof(int),cudaMemcpyDeviceToHost);

cudaMemcpy(Bh,B,sizeof(float),cudaMemcpyDeviceToHost);

printf("%i\t%f\t%i\t%f\n",*Ah,*Bh,C,D);

C = 5;

D= 7.2;

myOtherFex bogoFex;

for(int itr = 0; itr <20; itr++){

Bill<<<1,1>>>(A,B,C,D,bogoFex);

cudaMemcpy(Ah,A,sizeof(int),cudaMemcpyDeviceToHost);

cudaMemcpy(Bh,B,sizeof(float),cudaMemcpyDeviceToHost);

printf("%i\t%f\t%i\t%f\n",*Ah,*Bh,C,D);

}

myFex bogoFex2;

for(int itr = 0; itr <20; itr++){

Bill<<<1,1>>>(A,B,C,D,bogoFex2);

cudaMemcpy(Ah,A,sizeof(int),cudaMemcpyDeviceToHost);

cudaMemcpy(Bh,B,sizeof(float),cudaMemcpyDeviceToHost);

printf("%i\t%f\n",*Ah,*Bh);

}

cudaFree(A);

cudaFree(B);

cudaFreeHost(Ah);

cudaFreeHost(Bh);

return 0;

}[/codebox]

Yep, that’s the general spirit of how to do it.

You can use inheritance in CUDA, but not virtual inheritance. Think of it this way, if a C++ feature uses pointers, dynamic allocation, or exceptions, it’s not going to work. The simpler “C with Classes” works, including operator overloading. Templates also work.

As for your usage of functors, you can clean things up a lot by not passing (information-less) empty functor arguments.

There’s no need since there’s no state. This makes code easier. The one thing you do get is an extra pair of annoying parenthesis but you get used to it.

template<typename Fex>

__global__ void Bill(int* A, float* B, int C, float D){  // note no need to pass any dummy Fex argument

Fex()(A,B,C,D);

}

The compiler of course inlines your desired function and no dummy struct is actually ever created or anything.

I think I see your point. I’d have to then call Bill as ‘Bill<<<blocksInGrid,threadsPerBlock>>>(A,B,C,D)’ from my main, is that correct?

Yep, exactly right.

Functors can be used on different levels, too, including on device subfunctions and such. Just templatize them too and pass in the parent template types.

The operator() is sometimes nice for one function, but is actually optional. You can instead generalize the functors and just give them static member functions, allowing you make multiple functions in a related bundle…

So you’d have

myKernel<<A, B, C, D>

then define a type like:

struct unsortedFloatListProcessor {

static void setup(…) { …}

static int swap(…) {…}

};

and inside you’d call them with evaluations like

dataType::setup(…);

dataType::swap(…);

dataType::cleanup(…);

and such.

Since they’re static functions you don’t need to do the temporary creation extra set of parenthesis.

Ok, so I did the functor thing, and it all works great under emulation mode, but nvcc complains about accessing host functions from the device when I’m in release mode. I tried doing the obvious and playing with pointers, but it was messy and didn’t work. Here’s a sample of how I’ve been using the functors and templates:

[codebox]

#include <stdio.h>

struct MyBob

{

void operator()(int j)

{

	;// do nothing

}

};

template

global void bob(Fex f)

{

f(0);

}

int main (int argc, char * const argv) {

// insert code here...

printf(“Hello\n”);

MyBob bubba;

bob<<<1,1>>>(bubba);

return 0;

}[/codebox]

So how do I fix this so that it will compile and run without emu mode?

Thanks much,

Paul

Templates are cool but is there any real substitute for function pointers when the actual function being called is not known at compile time? Something like virtual functions.
I know we can make a big switch statement, but my functions are rather small and I expect conditional branching to be an overkill (there may be a few dozens functions).

Could you recommend something? I will also need some kind of expression tree in the future and having switch everywhere is obviously a bad idea.

Thank you!

Paul,

declaring the operator with host device should do the trick