I’m trying to build a framework for simulations and would like to keep it sufficiently generic for ‘end users’ to add functions. To that end, I think having a global function pointers array would be ideal. So, I’m trying to do the following (pseudo-coded):
typedef func_ptr_t X(*)(Y, Z)
__device__ func_ptr_t *functions[2];
__device__ X f(Y y, Z z);
main(){
functions[0] = f;
}
This clearly doesn’t work because of the __device__ variables not being allowed to be used on the host side. So, I added a kernel to ‘assign’ the functions.
typedef func_ptr_t X(*)(Y, Z)
__device__ func_ptr_t *functions[2];
__device__ X f(Y y, Z z);
__device__ assign_funcs(index, func_ptr_t fptr){
functions[index] = fptr;
}
main(){
assign_funcs<<<1, 1>>>(0, f);
}
This now compiles but, when I try to call the function, I get an invalid program counter error.
I’m not sure what I’m doing wrong or if this is the right way of doing things. I’ve read some of the posts on similar problems but I can’t grasp what my mistake is. Would anyone care to lend a hand?
You can get what you have working by capturing the device pointer in device code, copying that pointer to host code, and then dispatch with that pointer.
and additionally, cudaMemcpyFromSymbol is designed to copy data from a device variable, not from a device function address.
so capture the device function address in device code, into a device variable. Then copy that variable to host code. The link I gave you already has such an example. here it is again.
You Sir, are a genius! Thanks for your help. Just for completion’s sake, here’s the full pseudocode.
// define a function type
typedef func_ptr_t X(*)(Y, Z)
// device side storage for this typeof functions
__device__ func_ptr_t *functions[2];
// an instance of this function type
__device__ X my_func(Y y, Z z);
// Cuda magic where we get the address of the
// function straight to a pointer
__device__ func_ptr_t capture = my_func;
// actually store the (host-stored) pointer to device address
__global__ assign_funcs(index, func_ptr_t fptr){
functions[index] = fptr;
}
main(){
// temporary storage for function address
func_ptr_t address_stored_on_host;
// copy device function address to temporary storage
cudaMemcpyFromSymbol(&address_stored_on_host,
capture, sizeof(func_ptr_t));
// set the address on the function addresses array
assign_funcs<<<1, 1>>>(0, address_stored_on_host);
}