"no CUDA-capable device is available" after 2 hours simulation

“no CUDA-capable device is available” is the message I got each time after approximately 2 hours of the simulation running using CUDA.
Here is some details:
I am iterating through a list of datafiles and doing some similar simulations for these datafiles. If I restart my simulation after this error message, starting from the file where my error appeared, it works perfectly fine next 2 hours and dies again. So, it does not seem to be data-related problem. I also checked the temperature (for 30 min run) and it did not get over 55C, so it does not seem to be heat-related problem either. Does someone experienced this kind of problems before when running long simulations? Any ideas what could it be related to?
The system: GTX 295 (zotac), driver 8.17.11.9562, Win Vista 64x, CUDA 3.0 (x86).

Below is a kown issue from the release notes of 3.0, thought it might be relevant.

Individual kernels are limited to a 2-second runtime by Windows
Vista. Kernels that run for longer than 2 seconds will trigger
the Timeout Detection and Recovery (TDR) mechanism. For more
information, see
http://www.microsoft.com/whdc/device/displ…dm_timeout.mspx.

GPUs without a display attached are not subject to the 2 second
runtime restriction. For this reason it is recommended that
CUDA be run on a GPU that is NOT attached to a display and
does not have the Windows desktop extended onto it. In this
case, the system must contain at least one NVIDIA GPU that
serves as the primary graphics adapter. Thus, for devices like S1070
that do not have an attached display, users may disable the Windows TDR
timeout. Disabling the TDR timeout will allow kernels to run for
extended periods of time without triggering an error.

The following is an example .reg script:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers]
"TdrLevel"=dword:00000000

My Windows TDR timeout is switched off to enable Nsight, but anyway I can not see the relation. I have multiple kernel executions during these 2 hours, and none of them is actually running longer than 2 seconds.