A100: 312 TMAC/s or 312 TFLOP/s

fiandola · January 11, 2023, 9:10pm

NVIDIA’s specification says that an A100’s tensor cores have a peak performance of 312 TFLOPs. (I know… the usual disclaimers… that’s best-case and not achievable for real applications.)

What I am wondering is: are we defining 1 TMAC as 2 TFLOPs? Or, when NVIDIA says TFLOP, do they mean TMAC?

Robert_Crovella · January 11, 2023, 10:32pm

All calculations of this type for discrete CUDA GPUs or the CUDA GPU component of a SoC count the multiplication and addition operations as separate floating-point ops.

A number like 312 is counting 156 for addition ops and 156 for multiplication ops.

fiandola · January 12, 2023, 7:00am

Thanks!

P.S. It’s great to see that you’re still on the forums, Robert. I think you answered the first of my questions (on StackOverflow) over 10 years ago now.

system · January 26, 2023, 7:01am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.