A100: 312 TMAC/s or 312 TFLOP/s

NVIDIA’s specification says that an A100’s tensor cores have a peak performance of 312 TFLOPs. (I know… the usual disclaimers… that’s best-case and not achievable for real applications.)

What I am wondering is: are we defining 1 TMAC as 2 TFLOPs? Or, when NVIDIA says TFLOP, do they mean TMAC?

All calculations of this type for discrete CUDA GPUs or the CUDA GPU component of a SoC count the multiplication and addition operations as separate floating-point ops.

A number like 312 is counting 156 for addition ops and 156 for multiplication ops.

Thanks!

P.S. It’s great to see that you’re still on the forums, Robert. I think you answered the first of my questions (on StackOverflow) over 10 years ago now.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.