I recently tried to use function pointer to dynamically define several processing stage in my application, running on a sm_30.
It would be difficult to post the code here, as there are many differents files and functions implicated, but basically, I started from the sample that was included in the Cuda Toolkit 5.0.
I allocate a device function buffer where I copy a device function pointer, defined just as in the sample thanks to cudaMemcpyfromsymbolAsync used with DeviceToDevice copy Kind.
My device pointer is defined like this :
typedef void (*func)(structGpuArgument*); __device__ func gpuFuncPtr = gpuFunc1;
func* pFuncDevBuffer; cudaMalloc(&pFuncDevBuffer,NB_FUNC*sizeof(func)); cudaMemcpyFromSymbolAsync( pFuncDevBuffer+i ,gpuFuncPtr,sizeof(func),0,cudaMemcpyDeviceToDevice,stream)
In fact, everything works fine as long as the global kernel that takes the device function buffer in argument is defined in the same file that the function and its pointer.
The kernel then can print out the address of the function (0x4) and execute its code without problem
I don’t use the separate compilation.
When, in the same instance of the program a second kernel, defined elsewhere takes the very same function pointer buffer in argument, it can print out the very same memory address for the function pointer (0x4) but if it tries to execute it, it fails issuing a unspecified launching error, and any other cuda API call freezes after, I need to reboot my computer (reset through cuda-smi isn’t supported on my gpu).
I would like to know if there is a known issue in using function pointer this way, ie by using a function pointer buffer defined in an other file, but sharing the same function pointer definition.
Also if there is a workout for reseting a device after a segfault without rebooting the whole system, it could help me to save time while debugging my application.
Thank you for your help
function_pointer.txt (2.73 KB)