Here’s the response I received back from my contacts at NVIDIA:
On Windows Vista and later, the watchdog timer applies to all WDDM devices, regardless of whether there is a display attached. For someone hitting the timeouts, they have three choices:
(1) Use a TCC-capable board (e.g., a Tesla) and enable TCC mode with nvidia-smi.
(2) Increase the watchdog timeout in the registry (I prefer this over disabling the timeout completely). A timeout of, say, 30-60 seconds is enough to let most valid cases complete but still reset without rebooting in cases of a true hang.
(3) Change the kernels – or rather the batches of kernels, which are a little hard to predict under WDDM – so they always finish inside the default two seconds maximum.
If one of these solutions is implemented and the app still hangs/TDR’s, then it could be a legitimate deadlock condition in the application code, the compiler-generated code, or the NVIDIA driver, in that order of likelihood.
My best guess is that your device is set to use WDDM (Windows Display Driver Model) instead of TCC (Tesla Compute Cluster) mode. Here’s some documentation I found on how to swtich modes: http://http.developer.nvidia.com/ParallelNsight/2.1/Documentation/UserGuide/HTML/Content/Tesla_Compute_Cluster.htm.
If you are using a non-Tesla card (such as a GTX or Quadro), then your best option would be to increase the Watchdog time out.
Hope this helps,