I have a new computer specifically to use this GTX 1080 for deep learning.
The problem I am experiencing is that I am getting ResourceExhaustion errors from Tensorflow even though I am feeding it very small batches which should fit on the GPU.
At first nvidia-smi shows that x-server and gnome-shell are taking up a couple hundred MB of memory.
Sat Nov 4 23:58:15 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 387.22 Driver Version: 387.22 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 00000000:03:00.0 On | N/A |
| 24% 40C P0 46W / 180W | 274MiB / 8105MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 698 G /usr/lib/xorg-server/Xorg 18MiB |
| 0 733 G /usr/bin/gnome-shell 28MiB |
| 0 924 G /usr/lib/xorg-server/Xorg 100MiB |
| 0 974 G /usr/bin/gnome-shell 124MiB |
+-----------------------------------------------------------------------------+
Then I create a VGG network in Keras and it loads the weights of the network with ImageNet weights. Then I import keras in a python shell. As soon as I have done this, memory usage is 7721MiB / 8105MiB and it stays that way until I terminate the python process.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 387.22 Driver Version: 387.22 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 00000000:03:00.0 On | N/A |
| 24% 41C P2 44W / 180W | 7721MiB / 8105MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 698 G /usr/lib/xorg-server/Xorg 18MiB |
| 0 733 G /usr/bin/gnome-shell 28MiB |
| 0 924 G /usr/lib/xorg-server/Xorg 100MiB |
| 0 974 G /usr/bin/gnome-shell 124MiB |
| 0 6264 C /usr/bin/python3 7437MiB |
+-----------------------------------------------------------------------------+
I think that there must be an issue with my GPU not releasing memory.
Here is my system:
MSI - X99A TOMAHAWK ATX LGA2011-3 Motherboard
Intel - Xeon E5-1620 V4 3.5GHz Quad-Core Processor
G.Skill - Ripjaws V Series 16GB (2 x 8GB) DDR4-3000 Memory
Gigabyte - GeForce GTX 1080 8GB Video Card
4.13.9-1-ARCH x86_64 GNU/Linux
Above I mentioned that I am loading Keras and Tensorflow, here are the versions:
Tensorflow 1.4.0
Keras 2.0.9
Cuda 9.0.176
Please let me know if any other information is needed…
Thank you