GPU occasionally gets lost when running Tensorflow.

When I run Tensorflow on the GPU, the GPU occsionally gets lost and says I need to reboot. The nvidia-smi hangs. I need to physically force-reboot my computer to get it to normal because I cannot kill that process and even the reboot command will not work properly. The size of the model is not that big, but I’m not sure why this is happening.

Tensorflow shows:
2019-01-21 11:41:51.396906: E tensorflow/stream_executor/cuda/cuda_event.cc:48] Error polling for event status: failed to query event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure

I attached my bug report as reference.
nvidia-bug-report.log.gz (127 KB)

You’re getting an XID 79 which points to either overheating (check using nvidia-smi) or insufficient power supply.