Tesla K20 vs Titan X performance for the same code

p1esk · November 12, 2015, 8:19am

I have Tesla K20 and Titan X cards in my workstation.
I’m running neural network simulations using Theano library (CUDA 7.5 + CuDNN v3), dataset is ~600MB.

Here are some performance results:

Single simulation:
(first number is GPU utilization, second is time to completion)
Titan X: 35%, 12.9 min
K20: 80%, 9.3 min.

Two simulations
(identical independent instances of the code, running in parallel):
Titan X: 55%, 17 min
K20: 95%, 16.6 min.

Three simulations:
Titan X: 65%, 22 min.
K20: 99%, 24.6 min.

Four simulations:
Titan X: 70%, 25.8 min
K20: crashes (can’t allocate memory)

Utilization info is from Nvidia Control Panel - GPU Utilization Graph. By the way, where can I see GPU memory usage?

Can anyone explain these differences?

Why is Titan X not utilized more fully for the single simulation? How come it is slower for the single simulation case? Why Tesla can’t handle 4 simulations? 4 copies of the dataset (2.4GB) should fit in its memory (5GB), right?

Robert_Crovella · November 12, 2015, 2:53pm

K20 has less memory than Titan X, so eventually as you increase memory demands, K20 will run out of memory before Titan X will.

You’ve multiplied some number by 4, but that does not mean those are the only memory demands that theano/cuDNN is placing on the GPU.

You can witness memory usage using nvidia-smi in a console window. Use nvidia-smi --help to understand the various options, or there should be a man page for it on linux.

If your Titan X is also running a display, that my cause it to be slower. And you don’t mention if this is linux or windows, but your Titan X is likely to be somewhat slower in windows due to WDDM.

p1esk · November 13, 2015, 12:11am

Thanks! I’m on Windows. Neither card is used to display output.
Here’s nvidia-smi output:

C:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi.exe
Thu Nov 12 16:06:29 2015

+------------------------------------------------------+
| NVIDIA-SMI 354.35     Driver Version: 354.35         |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT... WDDM  | 0000:01:00.0     Off |                  N/A |
| 27%   67C    P2    79W / 250W |   1654MiB / 12288MiB |     22%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Quadro K600        WDDM  | 0000:02:00.0     Off |                  N/A |
| 25%   48C    P8    N/A /  N/A |    412MiB /  1024MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K20c          TCC  | 0000:03:00.0     Off |                    0 |
| 41%   55C    P0    72W / 225W |   1424MiB /  4799MiB |     45%      Default |
+-------------------------------+----------------------+----------------------+

This is a snapshot for when there’s a single simulation is runing on both cards.

What I’d like to understand is why Tesla is faster in this case, and why Titan X is not utilized more.

Robert_Crovella · November 13, 2015, 3:24am

TCC vs. WDDM may make a difference. Try putting the titan in TCC mode. You should be able to do this with nvidia-smi

p1esk · November 14, 2015, 7:05am

Wow, that really helped! Now Titan X is almost twice faster than before, and is significantly faster than K20 even for the single simulation case. The utilization didn’t change - still only 35%.

Thanks for the tip!

I wonder if it can go even faster if I switch from Tesla driver (354.35) to the latest Geforce driver (358.91)…

Topic		Replies	Views
Why use Titan over K20 in non-cluster environment CUDA Programming and Performance	3	2813	June 2, 2013
Comparison Linux vs windows of "cudaDeviceSynchronize" CUDA Programming and Performance	7	2390	August 13, 2013
Is this comparison correct? CUDA Programming and Performance	3	912	August 2, 2013
Mixing Tesla and Titan X in the same workstation? CUDA Setup and Installation	4	3125	November 18, 2015
Tesla K40 vs. Quadro M6000 vs. GeForce Titan X CUDA Programming and Performance	12	45372	April 7, 2015
K20 with high utilization, but no compute processes. CUDA Setup and Installation	12	26705	March 19, 2015
why 2.9 seconds to start tesla K20 CUDA Programming and Performance	12	1227	March 4, 2018
Tesla k20 vs GTX680 benchmarks...!!!!! CUDA Setup and Installation	6	9898	January 28, 2013
Final word on Titan X and TCC? CUDA Programming and Performance	17	11006	September 3, 2018
Speed problems with multi-gpu on GTX295 CUDA Programming and Performance	6	3080	January 5, 2010

Tesla K20 vs Titan X performance for the same code

Related topics