LaunchGrid issue. failure after successful LaunchGrid.

Hi guys,

I am quite new with CUDA but there is something I have difficulties figuring out.
I have a kernel that I can load fine, all set parameters seems to be ok and the LaunchGrid is returning CUDA_SUCCESS however the next function is returning CUDA_ERROR_LAUNCH_FAILED :(

I made sure that I was calling cuFuncSetBlockShape too so it is not that.

I am a bit surprised that the launch failed actually after the LaunchGrid call itself since I am not doing an asynchronous call… (well at least I would not expect it to be).

The call I am doing after the LaunchGrid is a cuMemcpyDtoH but in fact the same thing seems to happen whatever I do (like cuCtxSynchronize).

As far as I am aware the kernel has been compiled fine and is not even a big one (using 18 reg and 52bytes of shared mem), my block size is between 128 and 256 and the grid is something like 16x400.

Is there a way to get more information about the exact reason of the failure?
Any help would be appreciated.

Here are some information about my config just in case it helps.
XP64, Cuda 2.0beta, GF8800GTX+GF8600 both as display, Cuda running on GF8800GTX.
8GB ram.

I have also removed most of the code (now just calculating and address and updating the output buffer) for the kernel and it still fails in the same way.
I have also remove all textures code from both the cpp and the cu files.
I have really no idea about what is going on.

It is normal that cuLaunchGrid() returns success and cuCtxSynchronize() returns actual error code. All kernel calls are async!

As for the reason of your failure it’s hard to tell why it’s happening. It might be timeout issue ( ~5 sec ) or most probably you’re trying to read or write some unallocated memory (write past end of array, for example).

It would be better if you could post source code here (both kernel and host code responsible for calling).

Ok it is definitively not a timeout issue since it is returning well inside the 5s.

I will check the writes (only thing left in my shader).
I will try to filter the code a bit to be able to paste it.


Getting closer.
I have no idea why but it seems that CUDA doesnt like my last parameter which is a float.
If I remove it then it works.

One thing I am a bit worried is the pointer size.
My first parameter is a pointer but it seems to be passed as a 32bit (which seems reasonable for a GPU).
Is that correct or should it be a 64bit pointer too.

Got it working now.
I had to reduce the amount of parameters passed to my function.
I had 6 or 7 then after reducing it to 5 it worked.

Is there a limit?
I must have missed it in the documentation.

Anyway now next issue…


No, even if there is some limit on number of kernel parameters it is much higher than 6 or 7…