5 seconds limitation is permanent ?

Is 5 seconds limit on Windows permanent restricition?
Or do you have plan to fix this problem in any future update ?

By this restriction, we need to replace mother board to the one having multiple PCIEx slots.

Thank you.

Unfortunately the watchdog timer is a feature of Windows and isn’t going away.

The restriction doesn’t exist on Linux systems.

Is that true? Or will it be fixed in the next release? I’ve been experiencing this problem on RHEL4 since I began working with CUDA in Feb.

My mistake, it turns out there is actually a separate watchdog timer in the Linux graphics driver which limits the maximum execution time.

We are working on removing this restriction for a future release.

Excellent. Thanks for clearing that up. I’m very pleased at the progress that CUDA is making. It’s amazing the amount of power in these cards, and it’s almost a miracle that we can employ even half of it!

Are there any hard numbers on what the maximum execution time is on Linux? I’m running Fedora 6, and ~10 seconds seems to be the limit I’m running into.

On RHEL4E3 I see around 7.5 seconds, FWIW. Creating a workaround should be easy for many problems, making the time limit merely an inconvenience.

Perhaps I’m missing what you’re thinking of as a workaround, but I don’t understand how a workaround could be easy, or even possible. The execution time of a given kernel is probably a function of the size of the input data (for my kernel I’m summing N terms at each point on a 2D grid) in addition to the specs of the graphics card, the driver version, and the current state of the system (is the card being used by some other program?). How could one write code to reliably restart execution of the kernel before the crash occurs that runs on more than one particular card? Even ignoring that issue, just keeping the card fixed, in my case (which I think is rather straightforward) it’s difficult to determine execution time of the kernel over the 2D parameter space of the number of terms and the area of grid.

Personally, I’m looking forward to the Linux fix, and I’ll just let the Windows version of my code crash :D

Some problems have a trivial workaround. Could you, instead of summing the terms over every point on the grid, just have the kernel sum the terms over a smaller chunk of the grid? Or sum only the first M terms? Then put the kernel call in a loop and run it as many times as are necessary to complete the task, keeping each kernel call under 5-7 seconds.

It appears to me that highly data-parallel problems (such as those that run well in CUDA) should be able to be broken apart in this fashion. I only have limited experience with GPU programming, though, and I don’t fully understand what task you are trying to do.


Under Windows XP, there are two manifestations of the 5-second limit: the documented one where, if the card is the primary display adapter, the machine hangs after 5 seconds; and the “undocumented” one where, if the card is not the primary display adapter, a kernel will happily run for more than 5 seconds, but then returns with an “unspecified launch failure.”

Are these both due to the same Windows watchdog timer? And, if so, does this mean under Windows a CUDA kernel will never be able to run longer than 5 seconds?

Workarounds aren’t available in all cases, and we’re restricted to Windows machines, so this could be a severe limitation in many applications…