cusom Error reporting from device code I would like to generate an error from a kernel function

freebooter · June 18, 2010, 7:29am

For debugging purposes I would like to generate an error from kernel code.

In the manual it states that:

“The runtime maintains an error variable for each host thread that is initialized to cudaSuccess and is overwritten by the error code every time an error occurs.”
And indeed a thread can exit with an error and processing stops and the error is reported. So the code is there.

Is there a non-documented (or a hint to the documentation) possibility (other than e.g. provoking an error through an invalid memory access) to exit a thread with an error?
Something like

if ( myVar < 0.0f ) {
_threadExit(-1);
}

Regards Rolf

avidday · June 18, 2010, 8:16am

You need to differentiate between host threads and GPU threads. The error mechanism quoted from the documentation is a per host thread function inside the driver on the host. It doesn’t have anything to do with threads on the GPU. There is no mechanism inside the GPU to abort a kernel. About the best you can do is keep a global memory flag which each block reads atomically once at the beginning of the kernel and whose value causes a return of all threads within a block if it is set. Use an atomic memory operation inside the kernel to allow a thread to flag an error condition, which will then make all subsequent blocks exit.

jack · June 18, 2010, 8:22am

If you need to return information / error codes / etc. from a kernel, you can try cuPrintf(); the source code is available if you’ve signed up for the Registered Developers program.

freebooter · June 18, 2010, 8:42am

Thanks to the tip.

However, when I force an invalid memory access (something like *invalidAddress = 0) in the kernel code all threads and blocks are instantly aborted and an error is reported to the host (I tried this). This is what I referred to when I said “the code is there”. There must be some error flag already, except it’s not accessible. So instead of forcing some error, I think it would be useful to to explicitely produce an error that does the same (by triggering something that is eqivalent to what happens when you access invalid memory).

regards Rolf

avidday · June 18, 2010, 9:47am

There are certainly hardware level memory protection mechanisms and some type of limited programmable interrupts and counters on the GPU (I am presuming that is how profiling and events are implemented, for example). And the results of those are monitored by the driver- But as I said, none of that is exposed in the CUDA language, and further to that the sort of “hackish events” you can trigger inside a kernel, like a deliberate access of an invalid address, often result in a loss or corruption of the host GPU context/state, which makes their scope rather limited and difficult to rely on in real world code.

There is a trap instruction defined in the PTX documentation, but I don’t know how to generate it with the compiler - perhaps inline assembly might work. Might be something to consider.

Topic		Replies	Views
Any way to signal invalid computation pattern inside a kernel Any custom error return mechanism CUDA Programming and Performance	5	1840	January 2, 2012
Raising an exception in Kernel CUDA Programming and Performance	2	3581	September 25, 2017
How to report an error condition in a kernel? CUDA Programming and Performance	2	4330	December 26, 2009
How to trigger a cuda error from inside a kernel CUDA Programming and Performance	5	12634	October 7, 2009
Error handling in CUDA kernels CUDA Programming and Performance	1	2708	June 2, 2011
Using assert in CUDA code CUDA Programming and Performance	11	10612	April 6, 2011
How to cancel a running cuda kernel? CUDA Programming and Performance	4	10439	February 1, 2012
No error for exceeding thread/grid size? CUDA Programming and Performance	0	5237	August 9, 2007
Clearing Cuda Errors CUDA Programming and Performance	6	11529	December 1, 2009
exit(1) in device? CUDA Programming and Performance	6	6573	June 4, 2020

cusom Error reporting from device code I would like to generate an error from a kernel function

Related topics