For me, the slowest part of CUDA development is finding my stupid mistakes leading to memory issues. At first this causes the whole system to hang and I had to reboot (or it’d helpfully reboot the machine for me). Timeout Detection and Recovery (TDR) makes debugging way faster, but I was even more excited to hear about having a second GPU which could be paused and stepped through or even hang without any issues at all.
Well, I’ve got a K20 and GTX660 in the one box now, with nsight 3.0 in vs2010. I’m working on a GL app which does some computation in CUDA. My cuda code chooses to run on the K20 and I can only assume my GL is stuck running on the 660. I’ve read that I need to disable TDR to be able to properly debug CUDA, which I’ve done. Indeed, I can step through cuda code, inspect memory and check out what each thread in each warp is doing. Very cool. But if I let the code run into memory issues without checking “Enable CUDA Memory Checker” to catch the bugs the system still hangs and I have to reboot. Why do I need to reboot if it’s just the K20 that’s stuck? The 660 is still available to draw my desktop.
Is there some other bit of software I need? Is the GL part of my app somehow interfering with regular behaviour? Is there a “no, don’t crash” option I need to set somewhere?
Thanks in advance!