Tesla V100 is slower than RTX 2080ti

shjang4724l · November 9, 2020, 1:01am

Hi.

I tested performance of V100 and 2080ti using TensorRT and pyCuda. The tested model was ResNet50 and Inception_v1.
But in my code, V100 was slower than 2080ti. In many references, V100 has high throughput more than 2080ti always.
I think, It seems like that I can not use TensorRt and Cuda appropriately. How can I use them properly?

If you want any further information like my code or frozen graph, please let me know.

Thanks.

njuffa · November 9, 2020, 1:18am

How big is the performance difference? By raw specs, the V100 and the RTX 2080 Ti would appear to offer roughly equal performance for non-double-precision computation. Exact comparison is difficult because it is not know what kind of clock boost is applied on a specific GPU for a given workload and specific operating comnditions.

shjang4724l · November 9, 2020, 4:16am

I checked latency and throughput in application. And I got the results below:

V100

(INT8 / Batch Size=1)
Latency: 0.68 ms / Throughput: 1442.46 fps
(INT8 / Batch Size=128)
Latency: 16.15 ms / Throughput: 7909.67 fps

2080TI

(INT8 / Batch Size=1)
Latency: 0.52 ms / Throughput: 1980.23 fps
(INT8 / Batch Size=128)
Latency: 13.69 ms / Throughput: 9390.88fps

The model is Inception_v1.
I read document “NVIDIA AI INFERENCE
PLATFORM” (https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/tesla-product-literature/t4-inference-print-update-inference-tech-overview-final.pdf).
In this doc, Tesla V100’s performance has 11,280 fps in case of batch size=128. How can I achieve this performance?

njuffa · November 9, 2020, 4:28am

The numbers will be useful for anybody who has relevant experience with these benchmarks.

I have never run these benchmarks. My first thought would be that you are using a different hardware and / or software configuration than what was used to generate the benchmark report. Have you looked into this aspect?

Very generally speaking, a high-frequency CPU (I would suggest >= 3.5 GHz base frequency), large amounts of low-latency high-bandwidth DDR4 system memory, and NVMe solid-state storage should help machine-learning application performance. But I have zero insight as to whether they have any impact on these particular benchmarks and if so, how much.

shjang4724l · November 9, 2020, 5:01am

Thank you njuffa.

your reply helps me a lot.
I’ll trying to find additional information.

Regards.

cbuchner1 · November 18, 2020, 4:36pm

Have you considered turning off the ECC mode for the V100 memory? This might result in a slight speed boost.

Topic		Replies	Views
Speed difference by CUDA version. TensorRT	3	1143	March 12, 2020
TITAN RTX VS 2080TI CUDA Programming and Performance	0	514	January 17, 2019
Titan V Deep Learning Benchmarks with TensorFlow cuDNN	0	647	March 14, 2019
2080ti vs Titan V CUDA Programming and Performance	16	5727	October 25, 2018
Is GeForce RTX 2080 slower than GeForce GTX 1080 on small matrix-matrix multiplication? CUDA Programming and Performance	12	2711	October 25, 2018
TensorRT model accuracy on different GPUs TensorRT	3	1956	October 3, 2018
why the titan v is slowed than rtx 2080ti ? CUDA Programming and Performance	14	796	July 7, 2019
Requesting recommendation on selection between V100 vs T4 vs RTX2080 Ti vs Titan RTX for CUDA programming CUDA Programming and Performance	1	2333	March 5, 2019
RTX3070 performance with TensorRT TensorRT	1	1163	December 9, 2020
int8 mode is different between 1080ti with 2080ti TensorRT	0	508	September 3, 2019

Tesla V100 is slower than RTX 2080ti

Related topics