Calculating TOPS and TFLOPS in H100

I have a question about calculating INT32 TOPS and FP32 TFLOPS in H100. I can figure out how to calculate FP32 TFLOPS = 1755 MHz(clock speed) * 114 (# of sm) * 128 (# of fp32 core / sm) * 2(fma) = 51.22 TFLOPS. However, for my knowledge, fp32 core do IMAD operation, so the INT32 TOPS and FP32 TFLOPS will be same. but the INT32 TOPs is the half of that of FP32. Am I missing something??

All other integer operations than IMAD, (add, shift, 32-bit bitwise ops), execute on the ALU pipe, which is only half throughput relative to FP32.

Thank you for answering. However, even so, shouldn’t the same TOPS come out when calculated with IMAD operations?

That would seem reasonable, but I guess you’ve found otherwise.

Regardless of where operations are performed, the specified max throughput for 32-bit IMAD on cc9.0 is 64 per clock per SM, whereas for 32-bit FFMA it is 128.

That is the proper basis for calculation, not counting functional units and deciding where ops are performed.

Thank you so much! Then, does it mean that the operation has no relation to which core it occurs on? Or should I understand that the IMAD operation takes twice as many cycles than FMA?

Those are good points.

Where it is executed, is only important, if you mix INT32 and FP32 instructions and want to know, if they compete for resources.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.