Nsight, debugging a CUDA + GL app with two Nvidia GPUs.

For me, the slowest part of CUDA development is finding my stupid mistakes leading to memory issues. At first this causes the whole system to hang and I had to reboot (or it’d helpfully reboot the machine for me). Timeout Detection and Recovery (TDR) makes debugging way faster, but I was even more excited to hear about having a second GPU which could be paused and stepped through or even hang without any issues at all.

Well, I’ve got a K20 and GTX660 in the one box now, with nsight 3.0 in vs2010. I’m working on a GL app which does some computation in CUDA. My cuda code chooses to run on the K20 and I can only assume my GL is stuck running on the 660. I’ve read that I need to disable TDR to be able to properly debug CUDA, which I’ve done. Indeed, I can step through cuda code, inspect memory and check out what each thread in each warp is doing. Very cool. But if I let the code run into memory issues without checking “Enable CUDA Memory Checker” to catch the bugs the system still hangs and I have to reboot. Why do I need to reboot if it’s just the K20 that’s stuck? The 660 is still available to draw my desktop.

Is there some other bit of software I need? Is the GL part of my app somehow interfering with regular behaviour? Is there a “no, don’t crash” option I need to set somewhere?

Thanks in advance!