Has 2-second timeout problem been fixed yet?

I developed a serious commercial app using CUDA 3.1 about two years ago. However, the 2-second timeout feature of WDM prevented my app from being usable with large problems, and I gave up on CUDA in favor of massive CPU multi-threading. But now that CUDA 5 is out, I’m considering getting back into CUDA development. However, this timeout limit is a deal killer. I know that someone provided a registry-edit fix for the developer’s computer, but that’s not good enough. I refuse to tell my customers that they have to edit their registry in order to run my software! I need a way for my app to temporarily disable the timeout while it’s running. Is this possible yet? Thanks!

Tim

Switch OS-es and your problem will be gone! ;)

Alternatively, you can tell your customers to use the un-crippled Tesla drivers by either using Tesla or Quadro cards (and paying a pretty premium) or by fixing the driver (you can edit the .inf file easily).

TDR(2-second timeout problem) is a configuration of Windows. You could disable it by editing the register. Please refer go this page: http://msdn.microsoft.com/en-us/library/windows/hardware/gg487368.aspx

As pszilar alludes to, the watchdog timer timeout is not a CUDA issue. Time-outs are also not specific to Windows, they can also occur on Linux (and presumably Mac OS X, but I have no personal experience with that platform). In general, a GPU can, at any given moment, either serve a compute task or a graphics task. This means running a CUDA kernel is mutually exclusive with refreshing a GUI as long as there is only a single GPU in the system. In order to prevent the GUI from becoming unresponsive, operating system implement a watchdog timer.

To avoid the watchdog timer issue on Linux when only one GPU is present, simply run without X. Not sure what all the alternatives are under Windows are, besides manipulating the TDR timeout limit. I seem to recall that using two GPUs, and extending the desktop only to the less powerful one, is one technique one can use. As far as I know, running with the TCC driver on Windows (already mentioned) is mutually exclusive with running a GUI on the same GPU, which is why the TCC driver is not affected by the TDR issue.

I can confirm that this also applies to Mac OS X.

I tried some of the registry tweaks for Windows, but still ran into trouble. I ended up chopping up my CUDA kernel to use smaller batches of data that each finished well under 2 seconds. Running lots of smaller batches worked ok for the task I was facing, but opens the door to other problems.

I now have a Tesla card and am really happy with the TCC driver and additional capabilities of that card, even though it was expensive.

I succeeded in it. Have you restarted your machine after changing the registry?

Thanks for all the replies! Unfortunately, I was not clear enough in my question, for which I apologize. I know that this timeout is an OS issue, not a CUDA issue, and I know that there is a registry fix. But this is not an option for me because the apps I sell are generally installed on all computers in a company, and I can’t ask the IT guy to edit all those registries. Also, breaking up the problem into smaller chunks introduces more complexity and overhead than I want to deal with. But I do see what must be a way: software editing of the registry via the Windows API.

Here is the rough idea: The Windows API offers (I believe) a way for a running program to edit the registry, though I’m sure it imposes some restrictions for security. I’m not enough of a Windows expert to know the restrictions. My hope was that someone would add a CUDA API call that does the appropriate Windows registry edit to allow the CUDA developer to specify whatever timeout is desired. That way the programmer could not only change it as desired, but then put it back to two seconds when the program is finished. Does anyone have any ideas on this approach?

Tim