Hi, I have weird problem when execute CNN with tensorflow.
the point of weird, everything is fine before use CNN.
it can proceed simple MNIST tutorial, can play youtube.
but problem occured at MNIST applied CNN. the RTX2080Ti works well before final training sets.
the training using GPU is very faster than only CPU. But when the CNN training finally completed, (ex, batch(100) total 1000, the process from step 900 to step 1000) ubuntu suddenly shutdown and restart. and also it occurs when I input nvidia-smi very quickly at the terminal. because of this, I can’t confirm error message.
Just in case I tried memory limitation using tensorflow, but it didn’t work.
Could you give me some clue about this problem?
my several version is
ubuntu16.04LTS
cuda 10.0 version
nvidia 418.88
python3.5.2
tensorflow 1.14
This sounds like a broader system or hardware issue rather than a problem with TensorFlow. Have you checked the system logs for error messages? The symptoms you describe could be explained by a failing or under-provisioned power supply. I would also recommend upgrading to the latest NVIDIA driver (currently 430.50).
With nothing appearing in the syslogs, I would turn my attention to the power supply. Nvidia-smi shows power utilization averaged over a small window of time, transient peaks within that window may draw higher power.