Poorer Performance on Better GPU

markusxwr · August 15, 2024, 3:08pm

Hi everyone,
I am running my kernel function which does NTT calculation, which contains many modular operations and memory accesses. I have used RTX2080Ti and A10 for my kernel. Should A10 have a better performance than RTX2080 Ti, because it is newer and it has more CUDA Cores/SM and greater memory size? In my result, I have found that A10 takes actually more time to execute the same kernel function than 2080Ti. The greatest difference between A10 is its clock speed 0.885Ghz, which is much lower than the one of RTX 2080Ti, 1.35Hz. But it has a maximal frequency(Boost Speed) for 1.685Ghz which is higher than 1.545Ghz of RTX2080Ti. But I think the maximal frequency can be rarely reached, right? Other features that may influence my kernel are actually similar on both GPUs. Does this mean that the clock frequency play a important role in the performance?

Robert_Crovella · August 15, 2024, 3:21pm

                 A10                 RTX2080Ti
# of SMs         72                  68
Mem BW           600GB/s             616GB/s
base clk         885MHz              1350MHz
max boost clk    1695MHz             1635MHz
max power        150W                250W

The GPUs are pretty similar. The biggest difference might be the max power consumption capacity. It’s quite possible that RTX2080Ti would be faster than A10 for some workloads.

Average clock during application execution is potentially impacted by max power draw. The A10 might run with a lower average clock, for some workloads, than the 2080Ti, due to the lower power limit.

markusxwr · August 15, 2024, 3:22pm

Understood. Thanks for answering my question!

njuffa · August 15, 2024, 7:46pm

That can certainly be the case, in particular when the code’s performance is limited by computational throughput. You may want to perform a roofline analysis of the code to establish what the performance limiters are.

A general note on consumer GPUs vs professional GPUs: When comparing GPUs from the same architecture family, professional GPUs usually operate at slightly lower frequencies than the corresponding consumer part. This is presumably so because professional GPUs are designed for reliable 24/7 operation (100% duty cycle) across a useful life span of 5 years or so, taking physical aging processes that affect electronic components into account. Consumer products are usually designed for a significantly smaller duty cycle.

system · August 29, 2024, 7:47pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
GPU SM Frequency CUDA Programming and Performance	2	384	August 15, 2024
Achievable vs specification clock frequency CUDA Programming and Performance	4	58	February 14, 2025
GTX980ti faster than RTX 2080ti? CUDA Programming and Performance	12	524	August 19, 2020
Is GeForce RTX 2080 slower than GeForce GTX 1080 on small matrix-matrix multiplication? CUDA Programming and Performance	12	2668	October 25, 2018
GPU Memory Less Than Promised CUDA Programming and Performance	19	3084	December 15, 2022
[Help] 1080 GTX - TI 20x slower than 2070 RTX? CUDA Programming and Performance	2	469	November 9, 2020
TITAN RTX VS 2080TI CUDA Programming and Performance	0	510	January 17, 2019
Any way to measure the latency of a kernel launch? CUDA Programming and Performance	13	6625	July 22, 2022
Hardware comparison CUDA Programming and Performance	3	1326	January 23, 2014
CUDA thread processors v. ATI stream processors CUDA Programming and Performance	2	2811	January 15, 2009

Poorer Performance on Better GPU

Related topics