My recommendation in that thread still stands: If you seek assistance with debugging, you would want to post a minimal buildable and runable code that others can examine and run.
Note that an application running for 2.5 seconds does not necessarily mean any CUDA kernels invoked by that application hit the operating systems time-out limit and trigger the watchdog.
If a CUDA program causes a server to become completely unresponsive, including access via ssh, and requiring power-cycling I would think there are much bigger issues with that machine than anything related to CUDA per se.
Again, without access to code that actually triggers these issues, this is just a guess.