Error on iteration of cuda kernel

Hi all!

I encounter a cuda error called ‘the launch timed out and was terminated’, when I run my cuda application.

I spent huge time to google related topic but still I couldn’t find a solution.

In brief, I have a lot of cuda kernels and run them as following;

for(iter=0; iter<max_iter; iter++){


When I iterate each kernel separately, it is working fine; however, if I run them all, after a few number of iterations, it is terminated with the error message, ‘the launch timed out and was terminated’.
Furthermore, the number of iteration I can run is vary.

Any comments, advices, or suggestions are welcome.

Let me guess: you are on Windows and use the WDDM driver.

Try inserting a cudaStreamQuery(0) in between kernels. This should prevent the driver from batching them up, so that the timeout applies to an individual kernel and not a whole batch.

You can also disable the watchdog timer :

Thanks Crankle and tera for the advices.

To tera,
I tried to insert a cudaStreamQuery(0) between kernels but I still have same problem.
By the way, does it depend on operating system?
I am working on Mac with GTX 285 for CUDA computation and GT 120 for display.

Here are another explanation about my problem a little bit more detail.

for(int iter=0; iter<MAX_ITER; iter++){

As I told you before, iteration on each kernel is working well; but when I tried to iterate all kernels as shown above, it stops working after random number of iterations.
Interestingly, it always stops working at kernel_2().

I tried to run my application with cuda_gdb; with cuda_gdb, it stops working kernel_4(); actually, kernel_4() is the largest kernel compared to others.
In more detail, in fact, cuda_gdb couldn’t finish its job and computer didn’t work at all, so I had to reboot my computer.

Any other advises you have?

To crankel,

I am working on Mac; is there a way to turn off the watchdog timer in Mac? I couldn’t find it.
By the way, I have two GPU, GTX 285 and GT 120. I am using GTX 285 only for CUDA computation. By doing this, I thought I can be free from the watchdog timer; am I wrong? Can you correct me?

Really appreciate to the advises and I hope I can get more advises to resolve my problems and to learn more about CUDA.

problem solved!!

The actual problem I had was related to the number of blocks in the last kernel.
The number of blocks are within the size (65,535 x 65,535 x 1).
However, computation time for each block in the last kernel was pretty long and it resulted in such error.
Of course, the number of blocks was determined at compile time according to given input data.

So, I reduced the amount of computation in the last kernel by dividing it into two kernels; now it works well as I wish.
Thanks for all advices and comments.