When I run Tensorflow on the GPU, the GPU occsionally gets lost and says I need to reboot. The nvidia-smi hangs. I need to physically force-reboot my computer to get it to normal because I cannot kill that process and even the reboot command will not work properly. The size of the model is not that big, but I’m not sure why this is happening.
2019-01-21 11:41:51.396906: E tensorflow/stream_executor/cuda/cuda_event.cc:48] Error polling for event status: failed to query event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
I attached my bug report as reference.
nvidia-bug-report.log.gz (127 KB)