Some times when my computations crash during debugging and a thrust memory exception occurs, my GPU becomes stuck at 574mhz. Is there any way to get it “unstuck” without rebooting or forcing the driver to crash? I typically run computations on multiple GPUs at once.
These crashes can occur anywhere from 20 minutes to 12 hours into a computation, or never (oh the joy of debugging!), so I’d like maximum performance at all times to identify what exactly is happening that is causing every variable to blow up to infinity, crash my computation, cause thrust to get an exception, and have my GPU stuck at 574mhz.