Fatal error:the launch timed out and was terminated

amol2 · April 15, 2016, 12:22pm

After the code run for 4 iterations,I am getting this error.
Fatal error: kernel1 (the launch timed out and was terminated at …/simple.cu:498)
*** FAILED - ABORTING
I am calling kernels in do while loop. Part of code inside the while loop is as follows,

     CUDA_SAFE_CALL(cudaMemcpy(u, hu, (sizeT), cudaMemcpyHostToDevice),__LINE__);
   	     CUDA_SAFE_CALL(cudaMemcpy(du, hdu, (sizeT), cudaMemcpyHostToDevice),__LINE__);
         DisU<<<dimGrid,dimBlock>>>(uo,vo,wo,u,du,p);
         cudaThreadSynchronize();
         cudaCheckErrors("kernel1");
         // CUDA_SAFE_CALL(cudaMemcpy( hdu, du,(sizeT), cudaMemcpyDeviceToHost),__LINE__);
        	             // CUDA_SAFE_CALL(cudaMemcpy( hu, u,(sizeT), cudaMemcpyDeviceToHost),__LINE__);
         CUDA_SAFE_CALL(cudaMemcpy(v, hv, (sizeT), cudaMemcpyHostToDevice),__LINE__);
         CUDA_SAFE_CALL(cudaMemcpy(dv, hdv, (sizeT), cudaMemcpyHostToDevice),__LINE__);
         DisV<<<dimGrid,dimBlock>>>(uo,vo,wo,v,dv,p);
         cudaDeviceSynchronize();
         cudaCheckErrors("kernel2");
         CUDA_SAFE_CALL(cudaMemcpy(dw, hdw, (sizeT), cudaMemcpyHostToDevice),__LINE__);
         CUDA_SAFE_CALL(cudaMemcpy(w, hw, (sizeT), cudaMemcpyHostToDevice),__LINE__);
         DisW<<<dimGrid,dimBlock>>>(uo,vo,wo,w,dw,p);
         cudaDeviceSynchronize();
         cudaCheckErrors("kernel3");

please help.

Robert_Crovella · April 15, 2016, 1:28pm

Your code (DisU kernel) is taking too long to execute and you are hitting the watchdog.

If you are on windows, please read this sticky forum thread:

[url]
https://devtalk.nvidia.com/default/topic/459869/cuda-programming-and-performance/-quot-display-driver-stopped-responding-and-has-recovered-quot-wddm-timeout-detection-and-recovery-/[/url]

If you are on linux, please read this help article:

[url]USING CUDA AND X | NVIDIA

TimothyMasters · April 16, 2016, 10:45am

It’s painful (I know all too well!), but you now have to learn the fine art of breaking a large problem into multiple small problems.

I always time my kernel launches during development and report the times in a log file. I keep an old computer with an old GPU to test worst-case scenarios.

In one situation, I even resorted to having my program at runtime do a small test launch to get a basis time and then extrapolate to the full problem to estimate how much I would have to break down the problem into multiple launches. Ugly but necessary in that case.

The bottom line is that you must immediately develop the habit of doing all CUDA design and coding in such a way that splitting a task into multiple launches is easy. Once you develop this habit, your CUDA life will become a lot simpler.

Robert_Crovella · April 16, 2016, 3:07pm

Or you could use any of the available methods to avoid the watchdog timer so you don’t have to chop your computation into little pieces. The traditional advice on windows is to use a GPU that is in TCC mode. On linux, various options are covered in the link I already provided.

amol2 · April 19, 2016, 10:05am

hello
thanks for responding.
Sorry that I have posed the problem wrongly.
I have not posted the entire code.

I still have another kernel Disp<<<>>> which
is also incorporated in the code.After executing
it I face the error mentioned above.

but when I call this as a function which is serial the code works.

Part of code inside the while loop is as follows,

   CUDA_SAFE_CALL(cudaMemcpy(u, hu, (sizeT), cudaMemcpyHostToDevice),__LINE__);
   	     CUDA_SAFE_CALL(cudaMemcpy(du, hdu, (sizeT), cudaMemcpyHostToDevice),__LINE__);
         DisU<<<dimGrid,dimBlock>>>(uo,vo,wo,u,du,p);
        // cudaThreadSynchronize();
         // CUDA_SAFE_CALL(cudaThreadSynchronize(),__LINE__);
         cudaCheckErrors("kernel1");
         // CUDA_SAFE_CALL(cudaMemcpy( hdu, du,(sizeT), cudaMemcpyDeviceToHost),__LINE__);
        	             // CUDA_SAFE_CALL(cudaMemcpy( hu, u,(sizeT), cudaMemcpyDeviceToHost),__LINE__);
         CUDA_SAFE_CALL(cudaMemcpy(v, hv, (sizeT), cudaMemcpyHostToDevice),__LINE__);
         CUDA_SAFE_CALL(cudaMemcpy(dv, hdv, (sizeT), cudaMemcpyHostToDevice),__LINE__);
         DisV<<<dimGrid,dimBlock>>>(uo,vo,wo,v,dv,p);
       //  cudaDeviceSynchronize();
         cudaCheckErrors("kernel2");
         CUDA_SAFE_CALL(cudaMemcpy(dw, hdw, (sizeT), cudaMemcpyHostToDevice),__LINE__);
         CUDA_SAFE_CALL(cudaMemcpy(w, hw, (sizeT), cudaMemcpyHostToDevice),__LINE__);
         DisW<<<dimGrid,dimBlock>>>(uo,vo,wo,w,dw,p);
        // cudaDeviceSynchronize();
         cudaCheckErrors("kernel3");
         /*cudaMemcpy( hu, u,(sizeT), cudaMemcpyDeviceToHost);
         cudaMemcpy( hv, v,(sizeT), cudaMemcpyDeviceToHost);
         cudaMemcpy( hw, w,(sizeT), cudaMemcpyDeviceToHost);
         cudaMemcpy( hdu, du,(sizeT), cudaMemcpyDeviceToHost);
         cudaMemcpy( hdv, dv,(sizeT), cudaMemcpyDeviceToHost);
         cudaMemcpy( hdw, dw,(sizeT), cudaMemcpyDeviceToHost);*/
      //   pressure();
        CUDA_SAFE_CALL(cudaMemcpy(cp, hcp, sizeT, cudaMemcpyHostToDevice),__LINE__);
        DisP<<<dimGrid,dimBlock>>>(u,v,w,du,dv,dw,cp);
         //	 cudaDeviceSynchronize();
       	   cudaCheckErrors("kernel3");

         cudaMemcpy( hu, u,(sizeT), cudaMemcpyDeviceToHost);
         cudaMemcpy( hv, v,(sizeT), cudaMemcpyDeviceToHost);
         cudaMemcpy( hw, w,(sizeT), cudaMemcpyDeviceToHost);
         cudaMemcpy( hdu, du,(sizeT), cudaMemcpyDeviceToHost);
         cudaMemcpy( hdv, dv,(sizeT), cudaMemcpyDeviceToHost);
         cudaMemcpy( hdw, dw,(sizeT), cudaMemcpyDeviceToHost);

I have checked for memory issue using cuda Memcheck,but there is no issue found related to memory.
please help.

Robert_Crovella · April 19, 2016, 2:01pm

If you get the error after Disp then the Disp kernel is taking too long. The solution options are the same as I already indicated.

Topic		Replies	Views
"the launch timed out and was terminated" error I'm getting this error but not every t CUDA Programming and Performance	3	6805	October 21, 2010
the launch timed out and was terminated strange error on cudamemcpy CUDA Programming and Performance	2	4407	November 29, 2012
question about "launch timed out" CUDA Programming and Performance	2	1389	April 24, 2009
the launch timed out and was terminated. CUDA Programming and Performance	6	23846	June 29, 2010
CUDA kernel timeout CUDA Programming and Performance	12	58797	December 22, 2022
Error on iteration of cuda kernel CUDA Programming and Performance	4	4345	July 11, 2011
cudaErrorLaunchTimeout and CUDA2.0 CUDA Programming and Performance	4	2110	July 2, 2008
The launch timed out and was terminated CUDA Programming and Performance cuda	3	941	December 29, 2021
problem with more data CUDA Programming and Performance	1	10423	October 29, 2011
"the launch timed out and was terminated" error on cuda v3.0 CUDA Programming and Performance	4	2313	July 23, 2011

Fatal error:the launch timed out and was terminated

Related topics