Avoiding driver timeouts How do I avoid driver timeouts?

TimothyMasters · June 16, 2010, 11:53am

It appears that if a kernel keeps the device busy for more than a couple of seconds, Windows or the Nvidia driver resets the device and trashes everything. Sigh.

Obviously, I have to split my process into multiple smaller units the will execute in sequence. Because of the enormous complexity of the algorithm, preserving states between kernel launches is going to be a major PITA. So, I have a few questions before I bite the bullet:

Is there any way I can disable this timeout, or set it to a larger value?
What, exactly, do I need to do between kernel launches to reset the timeout counter? I tried an experiment in which I just repeatedly launched a smaller kernel. After several dozen consecutive back-to-back launches, each running for about a quarter of a second, the driver crashed on a timeout. So obviously just letting the kernel finish and then re-launching it is not the answer. I cannot do a cudaThreadExit because that would clear global memory and force me to transmit a boatload of data between kernel launches. What do I need to do between launches?

Tim

seibert · June 16, 2010, 5:04pm

The timeout is supposed to be per-kernel launch (it certainly worked that way on any other system I’ve used), so if launching short kernels still triggering driver crashes, you have a different problem.

TimothyMasters · June 16, 2010, 6:04pm

Seibert - I think you are right. That test was for a large, complex kernel. I just wrote a small custom kernel and could not reproduce my prior results. So it appears to be per-kernel. Thanks! But this still leaves me with the horrendous problem of rewriting my gigantic, complex kernel in such a way that it can execute in sequential launch segments That’s so silly! Here I am in an intensely parallel environment, and I need to serialize my code!

While the kernel is running, if I move the mouse the cursor moves on the screen, so Windows is obviously communicating with the video card just fine during the computation. So what is its problem??? Why does it insist on shutting me down? I hope I can find a way to disable that timeout, or at least raise the limit.

Tim

UltraRacerX · June 17, 2010, 4:18am

Maybe this will help. Its an IBM Linux-based OS, but the example indicates its possible. I’m thinking its very OS and platform dependent so a specific solution is going to take you deep into the OS driver layer.

http://publib.boulder.ibm.com/infocenter/l…pmiwatchdog.htm

Al

Duff · July 1, 2010, 1:46am

Sounds like your problem is that you have a monitor attached to the card… if I recall corectly, you can only run for 5 seconds at a time on a card with monitor attached (as an overheating precaution I believe). The best solution is to either set it up as a headless node or get a second card for your monitor.

Duff · July 1, 2010, 1:46am

Sounds like your problem is that you have a monitor attached to the card… if I recall corectly, you can only run for 5 seconds at a time on a card with monitor attached (as an overheating precaution I believe). The best solution is to either set it up as a headless node or get a second card for your monitor.

hello_hi_hi · July 1, 2010, 12:10pm

Use a no monitor attched card.
Run your program in Unix/Linux with no x-windows lunched.

hello_hi_hi · July 1, 2010, 12:10pm

Use a no monitor attched card.
Run your program in Unix/Linux with no x-windows lunched.

Topic		Replies	Views
CUDA Timeout? CUDA Programming and Performance	7	27663	December 19, 2011
The Cuda 5 Second execution-time limit Finding a the way to work around the GDI timeout CUDA Programming and Performance	24	12693	July 26, 2010
Launch Timeouts CUDA Programming and Performance	32	21784	May 4, 2011
Need solution of "kernel launch timeout" from NVIDIA CUDA Programming and Performance	11	19371	March 4, 2009
CUDA kernel timeout CUDA Programming and Performance	12	58630	December 22, 2022
Cuda timeout and crash CUDA Programming and Performance	1	904	July 17, 2009
Will cudaThreadSynchronize() truly break up kernel launches to avoid WDM timeout? CUDA Programming and Performance	6	1984	May 29, 2014
per kernel timeout CUDA Programming and Performance	4	1593	December 11, 2015
Many kernels executed in streams cause driver timeout CUDA Programming and Performance	17	2625	April 15, 2011
kernel exec timeout CUDA Programming and Performance	8	16195	November 20, 2011

Avoiding driver timeouts How do I avoid driver timeouts?

Related topics