Basic Math Functions Causing Driver to Crash

Stefano5 · February 5, 2012, 2:35pm

Hi Guys,

I seem to be experiencing some instability when running my CUDA application. Iâ€™ve hunted the bug down to a set of trig functions (sinf() cosf() etc) that are being executed in a kernel. When running, suddenly the screen goes blank and Iâ€™m informed that the driver has stopped working. If I replace the functions with their intrinsic equivalents, the system runs correctly. The software is executing the kernal a number of times as part of an iterative process, during the first few iterations the system runs correctly, but crashes soon after.

Is this a common problem? If so is there any way I can stop this happening?

Regards,

Stefan

seibert · February 5, 2012, 4:51pm

This sounds like a problem with the watchdog timer more than a problem with the math functions themselves. If you are running a CUDA program on your display device, then the display driver watchdog will terminate any kernel that runs for more than a few seconds in order to allow the screen to update. (GPUs cannot do preemptive multitasking yet.) The difference between the standard trig functions and the intrinsic trig functions is how long they take. So, it is possible that the slower and more accurate functions are pushing your kernel runtime over the limit.

If the watchdog is killing your kernel, your two best options are either to split your kernel workload into multiple calls, or to run your program on a second, non-display CUDA device.

Stefano5 · February 5, 2012, 8:08pm

Hi seibert,

Thankyou for your reply, what your saying does make sense. A single execution of the kernel in question takes very little time, but when run 50 times (the number of iterations it’s set to run for) the driver tends to crash. Would the watchdog timer kill my kernel under these circumstances?

I’ve also found that if I halt the host thread using the Sleep() function for 100 milliseconds or so, this seems to also stop the system crashing. Obviously this ruins the objective of using CUDA to speed up the execution.

Thanks,

Stefan

seibert · February 5, 2012, 9:09pm

This seems a little weird, as I would normally expect that the driver can update the screen between kernel calls, even if you have queued up many of them. I’m assuming you are running on Windows, in which case I don’t have as much experience with the typical watchdog behavior. Does anyone know if the batching of kernel calls on Windows can lead to this?

One idea: Instead of adding a Sleep() function, can you call cudaDeviceSynchronize() between kernel calls? This will also slow down your code, but not as much as Sleep().

Stefano5 · February 5, 2012, 10:11pm

Hey seibert

cudaDeviceSynchronize() seemed to work! Interestingly the console updates a lot more smoothly now, I guess the display was struggling before?

Many thanks!

Stefan

Stefano5 · February 5, 2012, 10:45pm

edit : repeated post

tera · February 6, 2012, 3:15am

Under Windows kernel launches are batched. The screen only updates between batches, not individual kernels. Accordingly, the watchdog timer timeout also applies to whole batches. Inserting cudaDeviceSynchronize() however sends off all currently scheduled kernels and starts a new batch, thus allowing screen updates in between.

Stefano5 · February 6, 2012, 12:24pm

Thanks Tera,

I’m assuming that executing a Kernel multiple times in a for-loop would count as a ‘batch’. is that right?

Thanks,

Stefan

tera · February 6, 2012, 1:26pm

Yes, unless there are calls in between like cudaDeviceSynchronize() that require synchronization.

Topic		Replies	Views
Watchdog Timer What exactly is the watchdog timer? CUDA Programming and Performance	4	15917	July 8, 2008
Kernel problem, execution stop after ~15min CUDA Programming and Performance	7	1782	November 4, 2016
Can cuda-memcheck disturb multi-threaded multi-gpu CUDA applications' synchronization structure? CUDA Programming and Performance	6	740	March 20, 2018
Too much threads makes computer crashing If this kernell takes a long time to complete, I got a blue CUDA Programming and Performance	7	2026	April 24, 2009
CUDA Timeout? CUDA Programming and Performance	7	27689	December 19, 2011
Watchdog question CUDA Programming and Performance	6	2308	August 9, 2010
Diagnosing CUDA causing a hard system crash CUDA Setup and Installation	7	4703	April 10, 2015
Cuda timeout and crash CUDA Programming and Performance	1	905	July 17, 2009
cudaDeviceSynchronize doesn't work if the kernel function takes too long to complete CUDA Programming and Performance	3	8628	January 29, 2012
Many kernels executed in streams cause driver timeout CUDA Programming and Performance	17	2626	April 15, 2011

Basic Math Functions Causing Driver to Crash

Related topics