Poorer Performance on Better GPU

Hi everyone,
I am running my kernel function which does NTT calculation, which contains many modular operations and memory accesses. I have used RTX2080Ti and A10 for my kernel. Should A10 have a better performance than RTX2080 Ti, because it is newer and it has more CUDA Cores/SM and greater memory size? In my result, I have found that A10 takes actually more time to execute the same kernel function than 2080Ti. The greatest difference between A10 is its clock speed 0.885Ghz, which is much lower than the one of RTX 2080Ti, 1.35Hz. But it has a maximal frequency(Boost Speed) for 1.685Ghz which is higher than 1.545Ghz of RTX2080Ti. But I think the maximal frequency can be rarely reached, right? Other features that may influence my kernel are actually similar on both GPUs. Does this mean that the clock frequency play a important role in the performance?

                 A10                 RTX2080Ti
# of SMs         72                  68
Mem BW           600GB/s             616GB/s
base clk         885MHz              1350MHz
max boost clk    1695MHz             1635MHz
max power        150W                250W

The GPUs are pretty similar. The biggest difference might be the max power consumption capacity. It’s quite possible that RTX2080Ti would be faster than A10 for some workloads.

Average clock during application execution is potentially impacted by max power draw. The A10 might run with a lower average clock, for some workloads, than the 2080Ti, due to the lower power limit.

1 Like

Understood. Thanks for answering my question!

That can certainly be the case, in particular when the code’s performance is limited by computational throughput. You may want to perform a roofline analysis of the code to establish what the performance limiters are.

A general note on consumer GPUs vs professional GPUs: When comparing GPUs from the same architecture family, professional GPUs usually operate at slightly lower frequencies than the corresponding consumer part. This is presumably so because professional GPUs are designed for reliable 24/7 operation (100% duty cycle) across a useful life span of 5 years or so, taking physical aging processes that affect electronic components into account. Consumer products are usually designed for a significantly smaller duty cycle.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.