Dear All,

I recently tried to use function pointer to dynamically define several processing stage in my application, running on a sm_30.

It would be difficult to post the code here, as there are many differents files and functions implicated, but basically, I started from the sample that was included in the Cuda Toolkit 5.0.

I allocate a device function buffer where I copy a device function pointer, defined just as in the sample thanks to cudaMemcpyfromsymbolAsync used with DeviceToDevice copy Kind.

My device pointer is defined like this in a .cu.h : 

[code]
//device function pointer model
typedef void (*func)(structGpuArgument*);

//Declaring a function
__device void gpuFunc1(structGpuArgument* arg1);
[/code]

elsewhere I have a .cu that include the previous declaration that contains the following code:

[code]
//get the actual function pointer
__device__ func gpuFuncPtr = gpuFunc1;

//Buffer to store a list of function pointer
func* pFuncDevBuffer;
cudaMalloc(&pFuncDevBuffer,NB_FUNC*sizeof(func));

//copy the actual function pointer (symbol) to the list buffer 
cudaMemcpyFromSymbolAsync( pFuncDevBuffer+i ,gpuFuncPtr,sizeof(func),0,cudaMemcpyDeviceToDevice,stream)

//Launch the kernel that will use the functions
kernel_test<<<1,10,0,stream>>>(pFuncDevBuffer)
...

//defining the kernel that uses pointer buffer
__global__ void kernel_test(func* pFuncDevBuffer)
{
   printf("func address : %p\n",pFuncDevBuffer[0]);
   pFuncDevBuffer[0](NULL);
}

//defining the function pointed by the function pointer
__device void gpuFunc1(structGpuArgument* arg1)
{
     do_something;
}

[/code]

In fact, everything works fine as long as the __global__ kernel that takes the __device__ function buffer in argument is defined in the same file that the function and its pointer.
The kernel then can print out the address of the function (0x4) and execute its code without problem
I don't use the separate compilation.

When, in the same instance of the program a second kernel, defined elsewhere takes the very same function pointer buffer in argument, it can print out the very same memory address for the function pointer (0x4) but if it tries to execute it, it fails issuing a unspecified launching error, and any other cuda API call freezes after, I need to reboot my computer (reset through cuda-smi isn't supported on my gpu).

I would like to know if there is a known issue in using function pointer this way, ie by using a function pointer buffer defined in an other file, but sharing the same function pointer definition.

Also if there is a workout for reseting a device after a segfault without rebooting the whole system, it could help me to save time while debugging my application.

Thank you for your help