question about "launch timed out"

shingoxlf · April 24, 2009, 3:52am

Hi, guys, I am using CUDA to do a Kinetic Monte Carlo simulation. But when I ran my code, I got this error:
“the launch timed out and was terminated.”

I have watched the cubin file, the number of registers and shared memory are all OK.

And I have observation about this error:
That is if I enlarge the input size and the loop times in the kernel at the same time, it will cause this error.
It seems like there is a balance between the input size and the loop times.

Here is my testing result:
input size loop times without error report
400400 100
600600 10
1000*100 1

if I went over the loop times, it will report this error.

So, anybody can help?

Smokey · April 24, 2009, 4:12am

Driver calls (which are generally kernel-mode function calls) in most operating systems can only run for so long, the OS kernel will terminate any calls that exceed a certain time - this is commonly referred to as the ‘watchdog timer’ - it’s basically the OS kernel killing deadlocked/hanging kernel mode calls.

From what I understand nVidia’s driver has a function/thread associated with every launched kernel, which stays active until the kernel completes execution - thus if your kernel takes longer than the watchdog timer of the OS allows, the OS kernel will terminate that associated driver function call that’s running your kernel - your kernel will die - and ‘hopefully’ (though in windows you generally get a BSOD) you get a timed out error.

If I’m not mistaken anything based on the NT kernel has a ~5 second (this varies for unkown reasons, but ‘around about’ 5 seconds) watchdog timer.
Anything based on linux has a user-settable watchdog timer (I’m not sure on the default, if any default - I’m guessing it depends on your kernel config options when compiling).

Side note: Realistically it’s a softdog (watchdog timer implemented in software - i.e.: the kernel) timer on most consumer PCs - as watchdog timers are usually supposed to be separate add-on cards, which don’t rely on the OS to operate - and have direct control over certain BIOS features (eg: capable of resetting the system if the timer is triggered).

Edit: I should probably note there’s a lot more to it than that, not all kernel mode / driver calls have this limitation - I’m not a device driver programmer by any means, but this only applies to certain types of drivers (I’m not sure if this is dictated by the OS, or the driver author).
Much more specifically, in terms of CUDA - this watchdog timer allegedly only applies to CUDA code running on cards which are also display devices.

So you can avoid this problem in two ways,

Run your CUDA code on a seperate CUDA card which isn’t used as a display device.
Ensure your CUDA kernel calls execute in less than 5 seconds (eg: by breaking it up into smaller kernels).

Letharion · April 24, 2009, 7:56am

My experience is that “only applies to CUDA code running on cards which are also display devices.” is true.

There are two computers at work on which cuda programming is done.

On is equipped with a cuda capable card + an old 7000 something that runs the X server.

The other doesn’t have a monitor connected to it. All kernel are executed from a commandline. If I start up the X server on that box, I get timeouts, but I don’t need it, so it’s a non-issue there.

Topic		Replies	Views
CUDA kernel timeout CUDA Programming and Performance	12	59182	December 22, 2022
Watchdog Timer What exactly is the watchdog timer? CUDA Programming and Performance	4	16196	July 8, 2008
Fatal error:the launch timed out and was terminated CUDA Programming and Performance	5	9866	April 19, 2016
CUDA Timeout? CUDA Programming and Performance	7	27849	December 19, 2011
Error on iteration of cuda kernel CUDA Programming and Performance	4	4405	July 11, 2011
Need to remove timeouts and the "launch timed out and was terminated" message CUDA Programming and Performance	20	11561	May 24, 2010
Launch timed out CUDA Programming and Performance	4	6249	February 19, 2010
Cuda timeout and crash CUDA Programming and Performance	1	950	July 17, 2009
Launch Timeouts CUDA Programming and Performance	32	22141	May 4, 2011
the launch timed out and was terminated. CUDA Programming and Performance	6	24019	June 29, 2010

question about "launch timed out"

Related topics