I have Tesla K20 and Titan X cards in my workstation.
I’m running neural network simulations using Theano library (CUDA 7.5 + CuDNN v3), dataset is ~600MB.
Here are some performance results:
(first number is GPU utilization, second is time to completion)
Titan X: 35%, 12.9 min
K20: 80%, 9.3 min.
(identical independent instances of the code, running in parallel):
Titan X: 55%, 17 min
K20: 95%, 16.6 min.
Titan X: 65%, 22 min.
K20: 99%, 24.6 min.
Titan X: 70%, 25.8 min
K20: crashes (can’t allocate memory)
Utilization info is from Nvidia Control Panel - GPU Utilization Graph. By the way, where can I see GPU memory usage?
Can anyone explain these differences?
Why is Titan X not utilized more fully for the single simulation? How come it is slower for the single simulation case? Why Tesla can’t handle 4 simulations? 4 copies of the dataset (2.4GB) should fit in its memory (5GB), right?