Launch Timeouts

nkohlmei · July 2, 2007, 2:57pm

I’m very close to having a working program of what I’m trying to accomplish, but for some reason I’m getting

Cuda error: Kernel execution failed in file ‘testbed.cu’ in line 367 : the launch timed out and was terminated.

when trying to run the kernel (in debug mode), but the weird thing is that it only happens sometimes. The program should invoke the kernel and exit if it needs more data to analyze, get more data, then repeat the process. Sometimes it will run the kernel once, sometimes more than once, without any problems.

What could cause this to time out? If you need me to paste in any code, I can do that.

tcullison · July 3, 2007, 9:07pm

I have a similar problem. I get the same error when I reach a cetain number of threads. Above this number I always get the error. Below it, I have yet to have any problems. At this number, I get the error intermittently. I’ve also noticed that if I reduce the number of computations per thread (some times just by decreasing the size of a loop) that I can increase the number of threads I can run w/o getting the error.

Any suggestions will be welcomed.

paulius · July 4, 2007, 12:36am

You two may actually be having different issues. Most common causes of problems similar to yours:

tcullison:

too many registers per thread. You will not be able to launch if (registers per thread)x(threads per block) > 8192. Compile with -keep option and check the .cubin file to see how many registers are being used. You can try to reduce that with the -maxrregcount flag to nvcc (check nvcc documentation for details). Judging by your description, this most likely is your issue.
too much shared memory. A threadblock can use no more than 16K of shared memory, otherwise it fails to launch. Again, check the .cubin file, plus add whatever you’re allocating in shared mem dynamically.

nkohlmei:

your kernel runtime exceeds the time allowed by the watchdog mechanism. I believe that’s 5s in WinXP, not sure about the number in Linux.

tcullison · July 5, 2007, 2:59pm

paulius:

Thank you for the advice.

I have since checked that (registers per thread)x(threads per block) is indeed less than 8192.

After checking the cubin file, I also believe that the amount of shared memory I’m using is OK as well (56 * 512 threads per block).

However, if I modify the code a little (so that I am not accessing as much global memory) the code works fine. I have used the -maxrregcount flag count with the same register restriction when compiling both the modified and unmodified code; yet, the modified code executes while the unmodified does not. For clarity, the modified code is defined as the code for which I am not accessing global memory as much and the loops in the modified code iterate as much or more than the loops in the unmodified code.

Do you have any other suggestions, or insight?

I appreciate your help.

paulius · July 5, 2007, 9:06pm

Can you post your code (or provide a link or attachment)? Are you allocating shared memory dynamically when calling the kernel?

Also, you don’t have to multiply the smem value in the .cubin file by the number

of threads in a block - that value is for the entire block.

Paulius

tcullison · July 9, 2007, 1:54pm

paulius:

Thank you for your help.

I’m not allocating any dynamic memory when calling the kernel. Unfortunatelly, I cannot post my code. Although some time this week, I might come up with a bit of code that has the same problem that I can post.

I will post if I find anything new.

-tcullison

MisterAnderson42 · August 1, 2007, 5:37pm

You never said if you solved your problem or not. I was having exactly the same issue. Kernels that should execute in < 10ms would randomly give the launch timeout error, but only every 1 in ~10,000 launches. In my case, it turned out to be an incorrectly installed driver, which was causing other issues too. Run nvidia-bug-report.sh and check for any API mismatch errors to see if you have the same root cause.

See http://forums.nvidia.com/index.php?showtop…ndpost&p=231374

humorstar · November 30, 2007, 5:03am

I am running a kernel that takes more than 8.7 second to finish. I got the same error now:

the launch timed out and was terminated.

What should I do to avoid it? External Media

With a smaller problem size, the same kernel runs 7 second, and there is no launch error.

MisterAnderson42 · November 30, 2007, 2:26pm

That seems odd. Are you running on the linux console mode so that you don’t have a 5s limitation to begin with?

And to clarify an old post of mine above: my particular problem was not solved by correctly installing the driver. It still persists even in the CUDA 1.1 beta and NVIDIA is working on a solution (crossing fingers that the solution will be in the 1.1 release version). The particular calling card of this problem is a kernel that normally executes in a very short time (ms), but if you call it 100,000 times in a row, even with the SAME DATA, it will only get to 10,000 or 20,000 calls before a launch takes 5s and then either gives “launch timeout” or “unspecified launch failure”. Running in the linux console (no 5s limitation) just causes the machine to hard lock when it reaches this point.

byung · November 30, 2007, 3:54pm

I have resolved these issues by putting second video card which works as primary display only. How about this solution?

byung,

humorstar · November 30, 2007, 11:50pm

That seems odd. Are you running on the linux console mode so that you don’t have a 5s limitation to begin with?

And to clarify an old post of mine above: my particular problem was not solved by correctly installing the driver. It still persists even in the CUDA 1.1 beta and NVIDIA is working on a solution (crossing fingers that the solution will be in the 1.1 release version). The particular calling card of this problem is a kernel that normally executes in a very short time (ms), but if you call it 100,000 times in a row, even with the SAME DATA, it will only get to 10,000 or 20,000 calls before a launch takes 5s and then either gives “launch timeout” or “unspecified launch failure”. Running in the linux console (no 5s limitation) just causes the machine to hard lock when it reaches this point.

[snapback]286984[/snapback]

I reduced my block number by half and it can now barely run with less than 6.7 seconds. I think the performance is lowered but at least it can run. I heard about the 5 second limitation. But I don’t know why my computer/GPU can run up to a bit more than 7 seconds per kernel. I am using windows XP, running CUDA 1.1beta.

My computer is a Dual Xeon CPUs 3.0GHz dell precision 670 workstation. It is about 2.5 years old. GPU is G8800 Ultra. Just got it a few weeks ago.

To byung:

I do have two GPU cards but I have only one PCI-e slot in my computer. So at the time being this is not an affordable solution for me to buy a new computer. But thank you very much for the information.

mcleary · September 6, 2009, 12:26am

Some have any idea of why a kernel cannot take more then 7s to execute?

Jimmy_Pettersson · January 11, 2010, 9:33am

Hi,

Yes I’m currently running into the same issue as well. I’m bumping this to see if perhaps someone is the wiser on this issue?

A colleague of mine mentioned that if the card isn’t set to COMPUTE ONLY mode then the card will want to do some display related update after some time. I don’t know if that’s in any way related.

thanks
j

mcleary · January 11, 2010, 1:27pm

So, what you are saying is that if I have just one card that to the redenring I’m not able to write a kernel that needs more than 7 seconds to execute?

Do you know how to set the device to COMPUTE ONLY mode? I never hear somenthing like that before…

Jimmy_Pettersson · January 11, 2010, 1:54pm

I’m not saying that, I have a vague guess that it might be related. :)

I don’t think it’s called COMPUTE ONLY exactly, it’s got some other very similar name. Will search and get back to you…

EDIT: It’s called Compute-Exclusive mode https://www.wiki.ed.ac.uk/display/ecdfwiki/…-Exclusive+Mode

I’m not sure if this fixes our problem…

mcleary · January 11, 2010, 3:19pm

Hum, we need to figure out a way to avoid this problem when using just one card.

I let you know if found something.

Thanks

sbertout · August 17, 2010, 7:32am

anything found on this issue ?

I’m actually having the same issue and can’t afford all my user to have 2 graphics card ;-)

thanks for sharing anything you found so far !

Stephane

sbertout · August 17, 2010, 7:32am

anything found on this issue ?

I’m actually having the same issue and can’t afford all my user to have 2 graphics card ;-)

thanks for sharing anything you found so far !

Stephane

MisterAnderson42 · August 17, 2010, 1:03pm

Yeah, its easy to solve. 1) Install linux 2) Disable X-windows from starting (i.e. set inittab default to 3 or remove xdm from the startup scripts). 3) Run your application without any launch timeouts!

MisterAnderson42 · August 17, 2010, 1:03pm

Yeah, its easy to solve. 1) Install linux 2) Disable X-windows from starting (i.e. set inittab default to 3 or remove xdm from the startup scripts). 3) Run your application without any launch timeouts!

Topic		Replies	Views
The Cuda 5 Second execution-time limit Finding a the way to work around the GDI timeout CUDA Programming and Performance	24	12717	July 26, 2010
Deminishing performance? CUDA Programming and Performance	29	13083	March 5, 2009
Problems running CUDA on non-primary display CUDA Programming and Performance	23	54500	June 27, 2008
problems with cuda on linux CUDA Programming and Performance	13	22208	May 16, 2007
CUDA kernel timeout CUDA Programming and Performance	12	58755	December 22, 2022
Silent kernel failure CUDA Programming and Performance	25	8292	May 18, 2020
Deciphering an NVRM: Xid message? CUDA Programming and Performance	27	78083	April 1, 2012
GPU and CPU don't run in (pure) parallel ? CUDA Programming and Performance	24	20147	May 4, 2007
CUDA Kernel error: "The launch timed out and was terminated" Jetson Nano cuda , nano2gb	5	2996	October 15, 2021
unspecified launch failure kernel fails if a loop is too long CUDA Programming and Performance	8	42841	April 25, 2007

Launch Timeouts

Related topics