Hi,
We use Tensorflow for training CNNs. This works without any issues most of the time. But after a while sometimes the complete OS freezes.
After a restart, tensorflow looses the ability to use the GPU (but not always).
We guess the problem is Linux, Tensorflow or CUDA.
We already used different images, batch-sizes etc. The code isn’t the problem too.
This freezing isn’t the main problem. The main problem is, that we have to reinstall the complete system, to use the GPU again.
Our configuration is:
Linux Ubuntu 16.04
AMD Ryzen 7 1800X Eight-Core 3.6 GHz
32 GB RAM
Gefore GTX 1080Zi
Latest CUDA
We cross-posted this also at stackoverflow
https://stackoverflow.com/questions/49752930/tensorflow-freezes-during-training-linux-os
We hope you can help us and thank you in advance!
Greetings