Need solution of "kernel launch timeout" from NVIDIA

rainysky · February 19, 2009, 8:44am

The 5 seconds of kernel launch time limitation is making cuda development very inconvenient. :wacko:
Though some kind of programs can be divided into small pieces, lots of programs can’t be divided in an easy way!

To make cuda development more difficult simply cause fewer developers to use cuda.

I just can’t understand why a kernel launch MUST be synchronous? Can’t there be an API that start the kernel and return instantly, and another API to poll whether the kernel launch has finished? A user mode polling/wait can be much better than an kernel mode polling/wait used in current cuda.

By the way, most other parts of cuda are very nice.

Thanks a lot

seibert · February 19, 2009, 1:31pm

This is a limitation of Windows XP when the CUDA device is shared between computation and graphics. When your kernel is running, the graphics driver cannot update the GUI, and after a few seconds, the operating system decides something is wrong and aborts your kernel.

The solution to this is to have a CUDA device which is not running your main display. (Or to use Linux, where you can decide not to run a GUI at all.)

I don’t understand this part. From the perspective of the user code, this is exactly what happens. The kernel starts asynchronously, and the CPU continues executing your program. You can check on the status, or deliberately run a function to wait on the results.

Getting rid of the watchdog timer on the primary display would require a way to swap a running kernel’s register file and shared memory out to global memory mid-execution. Then the graphics driver could be given a time slice periodically to avoid the watchdog. Not impossible, though I have no idea if the current hardware is capable of this.

rainysky · February 20, 2009, 5:07am

The explanation make things clear. So this limitation come from microsoft instead of nvidia. :blink:

But is there any way to disable this time limit on windows? Windows do this kind of check to prevent a bad graphics call hang the entire system, but a good cuda kernel can still need long time to run. To break the tasks into small pieces can cause a redesign of the cuda kernel, and really make the cuda development more difficult.

tmurray · February 20, 2009, 5:09am

Use a dedicated compute card. There’s no way to turn it off on XP, nor should there be.

Mark_Johnstone · February 26, 2009, 1:51am

I just want to make sure I understand the situation…

As a practical matter, there is no way to program in CUDA if your nVIDIA card is the only graphics card in the system, right?

Is it time to dust off an old PCI graphics card to use to drive the desktop? This is kind of a pain as I only have one input on my monitor (Dell 30").

Thanks,

–Mark

tmurray · February 26, 2009, 2:47am

No, there’s no way to run kernels longer than 5s on WinXP if you’re using that card for display. If you want to run kernels longer than 5s, your options are:

use Vista and turn off the TDR timeout, which is probably a bad idea
run Linux from a console (this is the right answer)
buy a dedicated compute card (this is also the right answer)

Note that this is not “program execution is longer than 5s” or “total time spent on the GPU longer than 5s,” it’s a single kernel invocation longer than 5s.

AndreiB · February 26, 2009, 1:22pm

You can use CUDA even if there’s only one NVIDIA card in system, you just need to take extra care. Partition your worload into smaller chunks so that single kernel invocation stays well within 2s limitation.
And if you want Windows UI to be responsive while performing compatations on card, you should partition your task in even smaller chunks so that they run for 50ms or so.

MisterAnderson42 · February 26, 2009, 2:01pm

Of course you can use CUDA with only one card in the system. It wouldn’t be much use otherwise… tmurray gave you all the details you need I will just add by 2c of anecdotal information:
I’ve been developing CUDA applications for nearly 2 years now on single GPU machines. Nearly all kernels I’ve every written complete in milliseconds. In fact, I have never EVER seen the 5s launch timeout in 2 years of development unless I explicitly tried to trigger it (or, umm, accidentally wrote an infinite loop).

Mark_Johnstone · February 27, 2009, 8:42pm

I can probably break up my problem into sub 5 second chunks. What is the typical overhead of a kernel call? I couldn’t find anything on a quick search of these forums.

Thanks,

–Mark

E.D_Riedijk · February 27, 2009, 9:09pm

10 usec I believe. When you get close to the 5 sec limitation that overhead is truly negligible.

s2pi · March 4, 2009, 9:09am

What about the data stored in device memory? Do they stay in memory after 5 sec ?

tmurray · March 4, 2009, 5:05pm

Sure, why wouldn’t it? Am I somehow not explicit enough when I say that the only limitation is that a single kernel invocation cannot last more than 5s?

Topic		Replies	Views
The Cuda 5 Second execution-time limit Finding a the way to work around the GDI timeout CUDA Programming and Performance	24	12684	July 26, 2010
Load balancing Cuda contexts CUDA Programming and Performance	9	2490	November 9, 2009
Launch Timeouts CUDA Programming and Performance	32	21782	May 4, 2011
Very slow kernel launches CUDA Programming and Performance	8	7684	March 28, 2015
Can kernel function parallel with CPU code? CUDA Programming and Performance	12	7733	December 5, 2008
Why CUDA kernel calls takes so long? CUDA Programming and Performance	2	1425	July 17, 2017
Problem: cuda calls are synchronized CUDA Programming and Performance	17	2843	February 18, 2011
Watchdog Timer What exactly is the watchdog timer? CUDA Programming and Performance	4	15746	July 8, 2008
CUDA Timeout? CUDA Programming and Performance	7	27655	December 19, 2011
Cuda known limitations CUDA Programming and Performance	4	7119	October 9, 2010

Need solution of "kernel launch timeout" from NVIDIA

Related topics