I am quite new with CUDA but there is something I have difficulties figuring out.
I have a kernel that I can load fine, all set parameters seems to be ok and the LaunchGrid is returning CUDA_SUCCESS however the next function is returning CUDA_ERROR_LAUNCH_FAILED :(
I made sure that I was calling cuFuncSetBlockShape too so it is not that.
I am a bit surprised that the launch failed actually after the LaunchGrid call itself since I am not doing an asynchronous call… (well at least I would not expect it to be).
The call I am doing after the LaunchGrid is a cuMemcpyDtoH but in fact the same thing seems to happen whatever I do (like cuCtxSynchronize).
As far as I am aware the kernel has been compiled fine and is not even a big one (using 18 reg and 52bytes of shared mem), my block size is between 128 and 256 and the grid is something like 16x400.
Is there a way to get more information about the exact reason of the failure?
Any help would be appreciated.
Here are some information about my config just in case it helps.
XP64, Cuda 2.0beta, GF8800GTX+GF8600 both as display, Cuda running on GF8800GTX.
I have also removed most of the code (now just calculating and address and updating the output buffer) for the kernel and it still fails in the same way.
I have also remove all textures code from both the cpp and the cu files.
I have really no idea about what is going on.