cudaErrorLaunchTimeout and CUDA2.0

Romant · July 2, 2008, 10:36am

While testing my kernel on extremely big data sets I’ve faced the following error: cudaErrorLaunchTimeout. This happens after about 14 seconds after the launch. If to reduce the data set so the kernel would cope with it in about 12-13 seconds then all goes fine.

My video card IS NOT connected to the monitor, no watch dogs and other similar things should happen.

Also, I’ve noticed that this error happens with CUDA 2.0 but not with CUDA 1.1. In cuda 1.1 there was another unpleasant thing: infinite kernel could hang the system up, looks like in cuda 2.0 there is one additional monitor-independent watch dog that terminates the kernel after about 15 seconds. It is good for faulty kernels but it is a disaster for kernels that normally work for a long time.

Programming guide does not inform about this new feature, also, no info on how to switch this extra watch dog off.

Does anybody know how to prevent forced kernel termination ? Nvidia guys, the question goes primarily to you …

Thanks in advance!

E.D_Riedijk · July 2, 2008, 11:35am

Well, I have noticed no such thing, I have to kill my machine when a kernel enters an infinite loop. I have let kernels run for 24hrs before killing the machine.

It might be that after 13 seconds you reach a point where you write past the end of an array.

MisterAnderson42 · July 2, 2008, 11:36am

I have commonly seen these errors occur with kernels that write past the end of allocated memory.

I have also seen this behavior with a relatively simple kernel (normally executes in 1 millisecond) with no memory access errors. But in this case, the kernel would execute normally ~50,000 times and then get into an infinite loop on the next call. As hard as this problem was to reproduce, I only have a vague idea what caused it: it seemed to be many complicated warp divergences (for loops with different lengths in each thread) or maybe by too many __syncthreads() (in a different kernel). NVIDIA confirmed they could reproduce the issue but hasn’t resolved the bug yet. I worked around it by rewriting the kernels with small changes.

Romant · July 2, 2008, 12:24pm

I see …

However, I have no __synchthreads() and absolutely no divergence - just too much data to compute in 10-12 seconds.

MisterAnderson42 · July 2, 2008, 1:10pm

Is is possible that you may be writing past the end of allocated memory? Sometimes it can be hard to know for certain. One way to check is to compile in emulation mode and execute the program through valgrind (linux only) or a similar memory bounds checking tool.

Topic		Replies	Views
cudaErrorLaunchTimeout CUDA Programming and Performance	7	5972	November 28, 2009
Watchdog Timer What exactly is the watchdog timer? CUDA Programming and Performance	4	15766	July 8, 2008
question about "launch timed out" CUDA Programming and Performance	2	1388	April 24, 2009
cudaErrorLaunchTimeout error - how to repair after it happens ? CUDA Programming and Performance	1	1505	November 21, 2010
CUDA kernel timeout CUDA Programming and Performance	12	58635	December 22, 2022
Kernel problem, execution stop after ~15min CUDA Programming and Performance	7	1759	November 4, 2016
CUDA Timeout? CUDA Programming and Performance	7	27666	December 19, 2011
Error on iteration of cuda kernel CUDA Programming and Performance	4	4341	July 11, 2011
Help me about "Launch timeout error" Launch timeout error CUDA Programming and Performance	3	3701	September 16, 2009
Fatal error:the launch timed out and was terminated CUDA Programming and Performance	5	9751	April 19, 2016

cudaErrorLaunchTimeout and CUDA2.0

Related topics