I am trying to run tensorflow-gpu using Anaconda. I have a GeForce GTX 960M card, which has no problem at all running games. What I’ve noticed is that the tf-gpu runs fine for the very first run. But as soon as tensorflow stop running, the GPU naturally wants to idle from 1097 MHz to 0 MHz, which causes the GPU to crash. I can see that the “GPU is lost” on NVSMI. I have to then disable and re-enable my GPU in the Device Manager to get it to work.
I’ve done some testing with various codes while simultaneously monitoring my GPU usage using MSI Afterburner, GPU-Z, nvidia-smi and Task Manager. The only thing I see is that if the GPU goes to idle with tensorflow still holding memory, the card crashes.
One workaround to temporarily prevent this from happening for very small programs is by using the “allow_growth” feature as follows:
import tensorflow as tf config = tf.ConfigProto() config.gpu_options.allow_growth = True
However, this only works if the operation is really small such that it uses only about 0.1 GB of GPU memory. In this case, the GPU memory gets cleared to zero pretty quickly and only after that does the GPU go to idle. However, if the program is using memory of even 0.3 GB of memory my GPU crashes since the memory does not clear to 0 GB before the clock speed drops to 0 MHz (lower power state).