CUDA debugger does WDDM timeout at breakpoint

I have used an older CUDA development environment for years. I recently bought a new computer and installed the 6.5 development system. My projects all build and run correctly. (Yay!) But if I put a breakpoint in a kernel function and run the program under the nVidia debugger/Nsight, when the break point is hit the screen goes black, I get the infamous WDDM timeout message “Display driver has stopped responding and been reset” and the launch fails. Any thoughts? Thanks!

TimM

More info… After poring through documentation for the 6.5 system, I see that they recommend disabling the WDDM timeout. I am positive that I never had to do that under the old system, but I went ahead and did it. Disaster! When the breakpoint was hit, the entire system locked up and I had to do a power-switch reboot!

Can you point to the relevant portion of the relevant document? I would assume that what it says that when running the debugger with WDDM you would want to run with two GPUs, one to run the Windows desktop and one for the CUDA app, not hooked up to a display (“headless”), and that you would want to disable TDR on the headless GPU.

I actually saw that in the ‘Local help’ for Nsight in Visual Studio 2010. There’s a section called “Timeout detection and Recovery” which talks about either increasing the timeout to 10 seconds or disabling it entirely. I tried both. I’m doing just one-machine debugging, and I have a Titan Z, which is essentially two devices. As far as I can tell, Nsight does not allow selectively disabling WDDM timeout on just one device; it appears to be all or nothing.

By the way, Nsight has two cryptic options: “Desktop GPUs must use software preemption” and “Headless GPUs must use software preemption”. Because I don’t know what those mean, I left them at their default of True.

UPDATE… This talk of two devices led me to try an experiment that is a workaround for now. I normally call cudaSetDevice(0) so that my clients, who generally have one device, are okay. I temporarily changed it to device 1, and that fixed the problem! But still, it’s a vaguely unsatisfactory solution. I wish I knew what caused the massive timeout for a kernel that normally takes about 1 millisecond!

Tim

Without some magic, halting (via a breakpoint) a GPU that drives the Windows GUI via the WDDM driver will obviously effectively halt the system. The GUI cannot make any progress with the GPU stopped, and with the GUI the user control for the system, it becomes unresponsive to user input indefinitely, although it will continue to run.

I do not know how that software preemption feature works. It may be exactly the kind of magic that makes it possible to use the debugger with a single WDDM-controlled GPU these days, even when that WDDM device is driving the Windows desktop. Presumably the documentation gives guidance on how to set the various configuration settings for this case.

But since you already have a dual-GPU device, you would probably want to check how you can configure the driver so GPU 1 drives the display, leaving GPU 0 available for CUDA applications, allowing you to use all your programs as-is.

To resolve the issues with a long-running kernel, I would suggest running with cuda-memcheck before diving in with the debugger. cuda-memcheck can diagnose many issues with out-of-bounds accesses, race confiditons, and incorrect API arguments.

Software preemption is the magic that allows for single-gpu debugging with a halted kernel.

I think you’re likely to get better help if you post in the nsight forum:

https://devtalk.nvidia.com/default/board/84/nsight-visual-studio-edition/

Thanks! I’ll check that link. Silly me, I never even noticed that there was an Nsight forum! This really confused me, because my older computer had just a single GPU, and breakpoints worked fine on it, without my even having to do any special user-parameter setting. It just worked. So I was shocked when I ran into this weirdness.