Array of function pointers assignment

Hi,

I’m trying to build a framework for simulations and would like to keep it sufficiently generic for ‘end users’ to add functions. To that end, I think having a global function pointers array would be ideal. So, I’m trying to do the following (pseudo-coded):

typedef func_ptr_t X(*)(Y, Z)

__device__ func_ptr_t *functions[2];

__device__ X f(Y y, Z z);

main(){
    functions[0] = f;
}

This clearly doesn’t work because of the __device__ variables not being allowed to be used on the host side. So, I added a kernel to ‘assign’ the functions.

typedef func_ptr_t X(*)(Y, Z)

__device__ func_ptr_t *functions[2];

__device__ X f(Y y, Z z);

__device__ assign_funcs(index, func_ptr_t fptr){
    functions[index] = fptr;
}

main(){
   assign_funcs<<<1, 1>>>(0, f);
}

This now compiles but, when I try to call the function, I get an invalid program counter error.

I’m not sure what I’m doing wrong or if this is the right way of doing things. I’ve read some of the posts on similar problems but I can’t grasp what my mistake is. Would anyone care to lend a hand?

It’s not possible like that. See Programming Guide :: CUDA Toolkit Documentation
It is not allowed to take the address of a __device__ function in host code.

But it’s possible to pass __device__ lambdas to kernels. You may get something working if you use nvstd::function instead of raw function pointers.

So I could do a lambda wrapper to the device function?

You can get what you have working by capturing the device pointer in device code, copying that pointer to host code, and then dispatch with that pointer.

See here

You can also use lambdas. However every lambda has a unique type. So its difficult or impossible to create an array of lambdas in a useful fashion.

So it would be something like this?

typedef func_ptr_t X(*)(Y, Z)

__device__ func_ptr_t *functions[2];

__device__ X f(Y y, Z z);

__device__ assign_funcs(index, func_ptr_t fptr){
    functions[index] = fptr;
}

main(){
  func_ptr_t f_h = null;

  cudaMemcpyFromSymbol(&f_h, &f, sizeof(void *));

   assign_funcs<<<1, 1>>>(0, f_h);
}

Not quite. You’re still taking the address of a device function in host code here:

cudaMemcpyFromSymbol(&f_h, &f, sizeof(void *));
                           ^^

and additionally, cudaMemcpyFromSymbol is designed to copy data from a device variable, not from a device function address.

so capture the device function address in device code, into a device variable. Then copy that variable to host code. The link I gave you already has such an example. here it is again.

You Sir, are a genius! Thanks for your help. Just for completion’s sake, here’s the full pseudocode.

// define a function type
typedef func_ptr_t X(*)(Y, Z)

// device side storage for this typeof functions
__device__ func_ptr_t *functions[2];

// an instance of this function type
__device__ X my_func(Y y, Z z);

// Cuda magic where we get the address of the 
// function straight to a pointer
__device__ func_ptr_t capture = my_func;


// actually store the (host-stored) pointer to device address
__global__ assign_funcs(index, func_ptr_t fptr){
    functions[index] = fptr;
}

main(){
  // temporary storage for function address
  func_ptr_t address_stored_on_host;
  
  // copy device function address to temporary storage
  cudaMemcpyFromSymbol(&address_stored_on_host, 
                       capture, sizeof(func_ptr_t));

   // set the address on the function addresses array
   assign_funcs<<<1, 1>>>(0, address_stored_on_host);
}

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.