Integrated debugger fails with device 0

I am developing with a Titan, which has two devices. My program works perfectly regardless of whether I call cudaSetDevice() with 0 or 1. Suppose I set a breakpoint at the start of a small, fast kernel. If I am running Device 1, the debugger integrated with Visual Studio 2010 correctly stops at the breakpoint I set. However, if I set Device 0 and launch the kernel under the debugger, the screen locks and I get a WDDM timeout. The kernel returns error 30, unknown. Am I doing something wrong? Thanks!

Unless you take special steps, setting a breakpoint in CUDA code that is running on the WDDM GPU that is also hosting the display will cause the outcome you are witnessing.

Since device 1 is not hosting the display, setting the breakpoint in device code running on that GPU has no effect on the display.

Since device 0 is hosting the display, setting the breakpoint in device code running on that GPU causes the display to stop servicing OS operations. The OS detects this and triggers a WDDM timeout using the TDR mechanism. This resets the GPU, and the display, and you get the error 30.

If it were me, I would just debug on device 1. If you wish to debug this way on device 0, you should investigate using preemption/single-gpu debugging in nsight VSE (i.e. “the integrated debugger”). I believe this is covered in the nsight VSE documentation.

Thanks for the information. What you say makes sense. However, I searched the nSight VSE manual for a discussion of this issue and found virtually nothing. They did say that if you have multiple devices, you should disable TDR for debugging. I tried that and it made the situation much worse, as then I don’t get a graceful recovery; everything just locks up I have to reboot! I also tried many different settings for software preemption choices, and nothing helped.

My problem is that the program is finished and working on a single device, so there is no single-device degbugging that needs to be done anymore. I now want to run kernels on both devices simultaneously, splitting the work. But I’ve discovered that making the debugger work in this two-at-once situation is not as easy as I hoped. I really wish nVidia had good documentation on how to use their debugger in this situation. I searched like crazy and can find nothing. Oh well.