Function pointers crashing kernel calls

Hi all,

I’m retooling a current code to work with CUDA and one of my problems has been to get function pointer behavior onto a kernel call. Basically there’s a structure Container that holds function pointers, which I want to initialize on the device but then call on the GPU. When I try to call the function pointer, the kernel crashes, but commenting out that line lets the program proceed smoothly (except without updated boundary conditions). Here’s the relevant code (I can access other elements of ContainerS on device code so I don’t think the problem is a bad memory copy):



typedef struct Container_s{

  VCFun_t boundariesix1;

  DataS Data_dev;


typedef void (*VCFun_t)(dataStruct *dataStruct_pointer, int i, int j, int k);


ContainerS *Container_dev;

ContainerS Container_host;




      "cudaMalloc - Container_dev");;

initialize(Container_dev, &Container_host);


__device__ static void ix1boundary(DataS *Data_dev, int i, int j, int k);

void initialize(ContainerS *Container_dev, ContainerS *Container_host){

Container_dev->boundariesix1 = ix1boundary;


      &(Container_host), sizeof(ContainerS),

      cudaMemcpyHostToDevice), "cudaMalloc - ContainerS");


__global__ void boundaries_kernel(ContainerS *Container_dev){

  (*(Container_dev->boundariesix1))(Data_dev, i, j, k);


void boundaries(ContainerS *Container_dev){

Data_dev = Container_dev->Data_dev;


if ( cudaSuccess != cudaGetLastError() )

  printf( "Error - NOT boundary kernel\n" );



if ( cudaSuccess != cudaGetLastError() )

  printf( "Error - boundary kernel\n" );


__device__ static void ix1boundary(DataS *Data_dev, int i, int j, int k)


return; //Minimal for testing purposes;


Again, if I comment out (*(Container_dev->boundariesix1))(Data_dev, i, j, k); the kernel runs smoothly but if I leave it in I get ‘Error - boundary kernel’ as an output. Is there some subtlety for device function pointers that I’m missing here? This system already works fine on regular C code. I’m unfamiliar with function pointers however so I can’t tell whether perhaps it’s an issue with cudaMemcpy bringing a host function pointer value to the GPU or some such.

Any help would be greatly appreciated.



EDIT: I’m compiling with --sm_arch=20 and running on a system with a GTX 460 (capability 2.1).

You can’t take the address of a device function in host code.
You may statically assign it to a device variable and then copy that to the host. Or use a kernel to do the assignment on the device.