Current active kernel on device...

narayan_g · February 13, 2013, 8:42pm

My application has several different kernels which are called from the main program in a loop. Typical length of the simulation is several days.

However, the program becomes unresponsive after a few hours. I can see that the process is still active in the system, however produces no output. The GPU resource is likely occupied by the process, but I can’t figure out how to check which one of the kernels is keeping the device busy.

Is there a tool that enables us to see the currently active kernel on the device? My card is GeForce 580, and the system is Centos. It seems that nvidia-smi doesnot support diagnostics for this card.

I also tried checkpoint restarting right before the unresponsive phase, and the code doesnot crash at the same point twice. It is highly unpredictable so a tool that lets me identify which kernel is causing the problem will be helpful.

Thanks,

allanmac · February 14, 2013, 3:45am

You might consider attaching the debugger to the unresponsive CUDA application: (docs here).

If you simply want a log then one approach might be to inject cudaStreamAddCallback()/cuStreamAddCallback() callbacks into your kernel launching stream(s). The callback could, at the least, log that the previous kernel was completed and/or the next kernel is about to be launched.

Just be aware that the callback blocks everything downstream until it completes so it’s up to you to determine how lightweight the callback should be – i.e. an atomic increment on the host vs. a slow thread-safe printf().

Topic		Replies	Views
Device status CUDA Programming and Performance	4	1272	November 30, 2010
Any way to understand is GPU occupied CUDA Programming and Performance	1	661	April 23, 2015
Kernel not executed without any errors returned CUDA Programming and Performance	2	5869	March 5, 2012
device in use How to detect the device is in use CUDA Programming and Performance	2	4025	June 16, 2010
Is cudaGetDeviceCount really working? cudaGetDeviceCount seems not to work CUDA Programming and Performance	1	10309	April 18, 2007
Is there an easy way to find out if a CUDA device is driving a display? CUDA Programming and Performance	2	1570	April 12, 2013
how can I detect a vanished GPU card programmatically, and suggestions on root causing CUDA Programming and Performance	2	890	April 28, 2015
Dynamically find the next available GPU during run-time? CUDA Programming and Performance	3	360	October 12, 2021
How to check if an Application is running on GPU CUDA Programming and Performance	1	2246	August 9, 2019
Running kernels for more than a few seconds in Win7 CUDA Programming and Performance	1	517	May 19, 2012

Current active kernel on device...

Related topics