Error in lunching a kernel "the launch timed out and was terminated"

athlonshi · April 13, 2011, 8:03pm

Update: Problem solved due to an infinite loop in the kernel :(

Hi,

My kernel function is within a loop. The code runs fine when the kernel function is called in the first time. But when it is called again, I got an error message returned by cudaGetLastError() and cudaGetErrorString() saying “the launch timed out and was terminated”

I searched in the forum and it seems this is related to the watchdog timer trigger, which is about 5 seconds on Linux (I am using Ubuntu). I followed the suggestions and tried to turn off the x-server by “sudo /etc/init.d/gdm stop”. Then, the error message was gone, but the kernel function appeared running forever in the second call.

Then, I thought it would be better to time profile the kernel to see how long it took when it was executed correctly in the first time. I tried to create cuda event to do the time profile (follow the programer guide), it returned me zero second. I also tried to use clock() function in C, and again it returned me zero second. The last, I tried to use gettimeofday() in C so that I can record the wall clock time. This returned me 14 seconds for the kernel. I have printf statement before and after the kernel function for input and output. The time between I saw input and output was close to 14 seconds as expected. Occasionally, when the kernel was running, I used Ctrl+C to terminate the kernel at around 3 seconds, and the outputs were the same as those if I let the kernel complete normally in 14 seconds, which indicates that the kernel actually has stopped executing much earlier than 14 seconds.

My question is that how can I tell what is going on between 3 seconds and 14 seconds in the kernel. Why did the clock based timer return zero value.
What does the “the launch timed out and was terminated” indicate?

I am using Tesla C2050. For debugging purpose, the kernel uses 32 threads, but only one thread is executed.

Thanks!

athlonshi · April 13, 2011, 9:06pm

I just found this error message is associated with the first run of the kernel function

It appeared to be due to the second call of kernel function because I mistakenly placed cudaGetLastError() before cudaThreadSynchronize(), so in the first run, although the execution time is longer than 5 seconds, the cudaGetLastError() cannot detect this until the second run.

But still not sure why the kernel takes so long, and actually it does the job in much shorter time.

Hi,

My kernel function is within a loop. The code runs fine when the kernel function is called in the first time. But when it is called again, I got an error message returned by cudaGetLastError() and cudaGetErrorString() saying “the launch timed out and was terminated”

I searched in the forum and it seems this is related to the watchdog timer trigger, which is about 5 seconds on Linux (I am using Ubuntu). I followed the suggestions and tried to turn off the x-server by “sudo /etc/init.d/gdm stop”. Then, the error message was gone, but the kernel function appeared running forever in the second call.

Then, I thought it would be better to time profile the kernel to see how long it took when it was executed correctly in the first time. I tried to create cuda event to do the time profile (follow the programer guide), it returned me zero second. I also tried to use clock() function in C, and again it returned me zero second. The last, I tried to use gettimeofday() in C so that I can record the wall clock time. This returned me 14 seconds for the kernel. I have printf statement before and after the kernel function for input and output. The time between I saw input and output was close to 14 seconds as expected. Occasionally, when the kernel was running, I used Ctrl+C to terminate the kernel at around 3 seconds, and the outputs were the same as those if I let the kernel complete normally in 14 seconds, which indicates that the kernel actually has stopped executing much earlier than 14 seconds.

My question is that how can I tell what is going on between 3 seconds and 14 seconds in the kernel. Why did the clock based timer return zero value.

What does the “the launch timed out and was terminated” indicate?

I am using Tesla C2050. For debugging purpose, the kernel uses 32 threads, but only one thread is executed.

Thanks!

Topic		Replies	Views
question about "launch timed out" CUDA Programming and Performance	2	1389	April 24, 2009
Why does it crash? CUDA Programming and Performance	5	1161	June 30, 2009
"the launch timed out and was terminated" error on cuda v3.0 CUDA Programming and Performance	4	2313	July 23, 2011
the launch timed out and was terminated. CUDA Programming and Performance	6	23847	June 29, 2010
Error on iteration of cuda kernel CUDA Programming and Performance	4	4345	July 11, 2011
"the launch timed out and was terminated" error I'm getting this error but not every t CUDA Programming and Performance	3	6805	October 21, 2010
CUDA kernel timeout CUDA Programming and Performance	12	58799	December 22, 2022
cudaErrorLaunchTimeout and CUDA2.0 CUDA Programming and Performance	4	2110	July 2, 2008
Recovering from watchdog timeout CUDA Programming and Performance	3	3179	July 24, 2008
the launch timed out and was terminated strange error on cudamemcpy CUDA Programming and Performance	2	4407	November 29, 2012

Error in lunching a kernel "the launch timed out and was terminated"

Related topics