Hi guys,
I am quite new with CUDA but there is something I have difficulties figuring out.
I have a kernel that I can load fine, all set parameters seems to be ok and the LaunchGrid is returning CUDA_SUCCESS however the next function is returning CUDA_ERROR_LAUNCH_FAILED :(
I made sure that I was calling cuFuncSetBlockShape too so it is not that.
I am a bit surprised that the launch failed actually after the LaunchGrid call itself since I am not doing an asynchronous call… (well at least I would not expect it to be).
The call I am doing after the LaunchGrid is a cuMemcpyDtoH but in fact the same thing seems to happen whatever I do (like cuCtxSynchronize).
As far as I am aware the kernel has been compiled fine and is not even a big one (using 18 reg and 52bytes of shared mem), my block size is between 128 and 256 and the grid is something like 16x400.
Is there a way to get more information about the exact reason of the failure?
Any help would be appreciated.
Updated:
Here are some information about my config just in case it helps.
XP64, Cuda 2.0beta, GF8800GTX+GF8600 both as display, Cuda running on GF8800GTX.
8GB ram.
I have also removed most of the code (now just calculating and address and updating the output buffer) for the kernel and it still fails in the same way.
I have also remove all textures code from both the cpp and the cu files.
I have really no idea about what is going on.
Thanks.
Laurent.