Hi,
I ran a brand new setup solely for DL purposes. After putting all the parts together, I ran a few benchmark tests and thought the results were pretty low compared to what I could find online.
The documentation I followed can be found here: Installation Guide Linux :: CUDA Toolkit Documentation
Here is the setup I have:
HDW:
MotherBoard: MSI x399 gaming pro AC carbon
RAM: 4 x 16Gb
Processor: AMD Threadripper 1920x
GPU: NVidia Quadro P6000
SFW:
OS: Ubuntu Server 19.04
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.40 Driver Version: 430.40 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro P6000 Off | 00000000:41:00.0 Off | Off |
| 26% 37C P8 9W / 250W | 22807MiB / 24449MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1832 G /usr/lib/xorg/Xorg 8MiB |
| 0 2004 G /usr/bin/gnome-shell 4MiB |
| 0 5186 C /opt/anaconda3/envs/PythonGPU/bin/python 22781MiB |
+-----------------------------------------------------------------------------+
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Apr_24_19:10:27_PDT_2019
Cuda compilation tools, release 10.1, V10.1.168
Benchmarks:
I tried this test: https://github.com/keras-team/keras/blob/master/examples/mnist_mlp.py
Which gives me those results:
60000 train samples
10000 test samples
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 512) 401920
_________________________________________________________________
dropout_1 (Dropout) (None, 512) 0
_________________________________________________________________
dense_2 (Dense) (None, 512) 262656
_________________________________________________________________
dropout_2 (Dropout) (None, 512) 0
_________________________________________________________________
dense_3 (Dense) (None, 10) 5130
=================================================================
Total params: 669,706
Trainable params: 669,706
Non-trainable params: 0
_________________________________________________________________
Train on 60000 samples, validate on 10000 samples
Epoch 1/20
60000/60000 [==============================] - 6s 95us/step - loss: 0.2449 - acc: 0.9255 - val_loss: 0.1094 - val_acc: 0.9663
Epoch 2/20
60000/60000 [==============================] - 4s 65us/step - loss: 0.1024 - acc: 0.9690 - val_loss: 0.0872 - val_acc: 0.9734
Epoch 3/20
60000/60000 [==============================] - 4s 73us/step - loss: 0.0757 - acc: 0.9768 - val_loss: 0.0836 - val_acc: 0.9750
Epoch 4/20
60000/60000 [==============================] - 4s 66us/step - loss: 0.0611 - acc: 0.9812 - val_loss: 0.0663 - val_acc: 0.9806
Epoch 5/20
60000/60000 [==============================] - 5s 87us/step - loss: 0.0512 - acc: 0.9843 - val_loss: 0.0662 - val_acc: 0.9826
Epoch 6/20
60000/60000 [==============================] - 4s 69us/step - loss: 0.0438 - acc: 0.9871 - val_loss: 0.0725 - val_acc: 0.9812
Epoch 7/20
60000/60000 [==============================] - 4s 68us/step - loss: 0.0381 - acc: 0.9891 - val_loss: 0.0753 - val_acc: 0.9821
Epoch 8/20
60000/60000 [==============================] - 5s 84us/step - loss: 0.0337 - acc: 0.9902 - val_loss: 0.0769 - val_acc: 0.9821
Epoch 9/20
60000/60000 [==============================] - 5s 78us/step - loss: 0.0317 - acc: 0.9905 - val_loss: 0.0853 - val_acc: 0.9820
Epoch 10/20
60000/60000 [==============================] - 4s 71us/step - loss: 0.0279 - acc: 0.9920 - val_loss: 0.0774 - val_acc: 0.9835
Epoch 11/20
60000/60000 [==============================] - 5s 83us/step - loss: 0.0267 - acc: 0.9921 - val_loss: 0.0779 - val_acc: 0.9854
Epoch 12/20
60000/60000 [==============================] - 5s 78us/step - loss: 0.0238 - acc: 0.9933 - val_loss: 0.1056 - val_acc: 0.9806
Epoch 13/20
60000/60000 [==============================] - 5s 81us/step - loss: 0.0258 - acc: 0.9929 - val_loss: 0.0870 - val_acc: 0.9835
Epoch 14/20
60000/60000 [==============================] - 5s 79us/step - loss: 0.0219 - acc: 0.9939 - val_loss: 0.1002 - val_acc: 0.9834
Epoch 15/20
60000/60000 [==============================] - 5s 76us/step - loss: 0.0206 - acc: 0.9943 - val_loss: 0.0910 - val_acc: 0.9833
Epoch 16/20
60000/60000 [==============================] - 5s 81us/step - loss: 0.0210 - acc: 0.9942 - val_loss: 0.0963 - val_acc: 0.9841
Epoch 17/20
60000/60000 [==============================] - 4s 69us/step - loss: 0.0185 - acc: 0.9949 - val_loss: 0.0958 - val_acc: 0.9854
Epoch 18/20
60000/60000 [==============================] - 5s 82us/step - loss: 0.0185 - acc: 0.9951 - val_loss: 0.1040 - val_acc: 0.9839
Epoch 19/20
60000/60000 [==============================] - 4s 70us/step - loss: 0.0186 - acc: 0.9948 - val_loss: 0.1011 - val_acc: 0.9838
Epoch 20/20
60000/60000 [==============================] - 5s 88us/step - loss: 0.0184 - acc: 0.9951 - val_loss: 0.0974 - val_acc: 0.9856
Test loss: 0.09737630939959879
Test accuracy: 0.9856
I then ran this test: https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10_train.py
Which gave me the following results:
I0806 17:02:55.815466 140026390669120 basic_session_run_hooks.py:606] Saving checkpoints for 0 into /tmp/cifar10_train/model.ckpt.
2019-08-06 17:02:56.218330: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10
2019-08-06 17:02:56.687217: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-08-06 17:02:58.196827: step 0, loss = 4.68 (391.8 examples/sec; 0.327 sec/batch)
2019-08-06 17:03:00.722650: step 10, loss = 4.60 (506.8 examples/sec; 0.253 sec/batch)
2019-08-06 17:03:03.028074: step 20, loss = 4.52 (555.2 examples/sec; 0.231 sec/batch)
2019-08-06 17:03:05.343251: step 30, loss = 4.38 (552.9 examples/sec; 0.232 sec/batch)
2019-08-06 17:03:07.656660: step 40, loss = 4.39 (553.3 examples/sec; 0.231 sec/batch)
2019-08-06 17:03:09.945660: step 50, loss = 4.36 (559.2 examples/sec; 0.229 sec/batch)
2019-08-06 17:03:12.268526: step 60, loss = 4.28 (551.0 examples/sec; 0.232 sec/batch)
2019-08-06 17:03:14.569985: step 70, loss = 4.19 (556.2 examples/sec; 0.230 sec/batch)
2019-08-06 17:03:16.854503: step 80, loss = 4.12 (560.3 examples/sec; 0.228 sec/batch)
2019-08-06 17:03:19.187332: step 90, loss = 4.14 (548.7 examples/sec; 0.233 sec/batch)
I0806 17:03:21.569198 140026390669120 basic_session_run_hooks.py:692] global_step/sec: 4.27834
Question:
The performance displayed seems pretty low compared to what I can see on different benchmark websites. How can I improve the overall performance? Are these results consistent with a Quadro P6000?
Thank you!