Speed ​​difference by CUDA version.


I am comparing Tesla V100 with RTX 2080 super.
Tesla V100 : CUDA 10.0 & TensorRT
RTX 2080 super : CUDA 10.2 & TensorRT

And I follow the way I used it here.

Tesla V100 has very good hardware. So it is expected that it will perform well.
But RTX 2080 super faster than V100. I think there is a difference of about 1.5 times.

Why did this happen? is there sure there are big differences depending on the software version?


The software versions (TensorRT 5 vs TensorRT7, and CUDA 10.0 vs CUDA 10.2) can make signficant performance differences, depending on the model. Can you compare both using the same versions? And if you still see the perf difference, please share the scripts used to measure the perf so we can reproduce.


I reset the environment for both systems.

Tesla V100 system : CUDA 10.0 / TensorRT
RTX 2080 Super : CUDA 10.0 / TensorRT

And I did the same benchmark as above.
( https://developer.nvidia.com/embedded/jetson-nano-dl-inference-benchmarks )

The result was a faster RTX 2080 super. despite ten times the price difference.
Why? Why this happening?

Tesla V100 got faster by increasing batch size.