CUDA Timeout?

Is there some kind of built in timeout period in CUDA that prevents you from calling long-running kernels?

I’ve written a function that does a large amount of processing in a loop. I compiled the function with both device and host qualifiers, so that I can test it from the cuda kernel as well as on the cpu (the only difference is that I pass a ptr to device memory vs a pointer to host memory). I’ve tested the function and it works properly, but if I increase the number of processing iterations too high, on the device version the screen goes black and the kernel fails with unknown error.

You can check if there’s a run time limit on kernels using the deviceQuery executable in the SDK.
Here’s an example for my setup:

CUDA Device Query (Runtime API) version (CUDART static linking)
There is 1 device supporting CUDA

Device 0: “Quadro FX 1600M”
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 1
Total amount of global memory: 536150016 bytes
Number of multiprocessors: 4
Number of cores: 32
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 0.55 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: No
Compute mode: Default (multiple host threads can use this device simultaneously)


Press ENTER to exit…

PS: If you happen to own a GF9800GX2 (or maybe a GTX295), I believe the second GPU does not have a run time limit on kernels


There is a watchdog timer in the NVIDIA driver which prevents kernels from monopolizing the GPU for more than a fixed amount of time (5-10 seconds depending on the OS) when that GPU is also driving a display. The solution is to use a dedicated GPU for CUDA, or in the case of linux, dont run an active display on the card.

I checked the file that Nico said, and it says I do not have a runtime limit on kernels.

After poking around a bit, I found this:

“Disabling the Watchdog Timer While Testing Display Drivers”

I tried both registry keys but I’m still getting an error saying the kernel timed out

If you’re using Vista, disable TDR.…dm_timeout.mspx

I am using Vista. Thanks for pointing out this one.

I found ANOTHER timeout as well…for DirectDraw framelocked buffer…dm_timeout.mspx

So currently all they keys I have set are:

GraphicsDrivers\TdrDelay = 16 sec

GraphicsDrivers\TdrDdiDelay = 16 sec

GraphicsDrivers\DCI\Timeout = 15 sec

Watchdog\Display\BreakPointDelay = 3 (30 sec) (note that setting this to a higher number also has no effect)

And also note…

Run time limit on kernels: No

…but I’m STILL getting the “the launch timed out and was terminated” or “unknown error” (it randomly gives one of those two messages every time). I have not been able to get to 8 seconds. This happens usually at 6.5 - 7.5 seconds

Does anybody know how to achieve this under OS X?



Thanks so much for this post. I’m running Windows 7, so I tried just adding HKLM\SYSTEM\CurrentControlSet\Control\GraphicsDrivers\TdrLevel = 0 to the Registry (it wasn’t already there), then immediately checking the CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT attribute on my GPU, and voila - it returned zero (NO timeout) !! I didn’t even have to reboot !!

So then I changed TdrLevel = 3 in the Registry, and checked the CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT attribute again, and sure enough, it was non-zero (Time limit re-instated).

So now I’m thinking I’ll just leave it on (TdrLevel = 3), and let my CUDA program turn it off whenever it needs to use the GPU. Great news !! Thanks again…