Catching errors with kernel execution How to detect failure cases

With the standard syntax for issuing CUDA kernel execution, there seems to be no way to detect failures.

MyKernel<<< ... >>> (...);

This translates to a significant number of cuda runtime API calls, any of which can fail. I’ve looked at the C code generated by [font=“Courier”]nvcc -cuda[/font], and any failures result in the kernel not being executed. But application code is not made aware of the failure.

For serious applications, any errors need to be detected and handled appropriately. How do other people recommend doing this?

The only way I can think of at present is to take the C code generated by nvcc and edit it to handle return codes appropriately. But this options looks messy, time-consuming and error-prone. Is there another way?

Many thanks.

You should check return value of cudaThreadSynchronize() after kernel call.

This doesn’t seem like quite the same thing. It looks like it would catch failures in executing kernels, but it doesn’t look like it would pick up failure return codes that would be returned by cudaConfigureCall, cudaSetupArgument or cudaLaunch.

I need a solution that will handle all failure cases.

In debug builds, I’ve always called cudaThreadSynchronize() and then cudaGetLastError to check for errors. I’ve never looked at the source code that nvcc generates, but wouldn’t cudaGetLastError get any error set by cudaConfigureCall, etc?

Edit: I should add that this catches errors such as “too many resources requested for launch” when asking for too many threads per block. I presume this error comes from cudaConfigureCall or one of the other setup functions.