The tensor core performance detail of Jetson AGX Orin 32GB

I want to know the performance detail of Tensor core on Jetson AGX Orin 32GB.

From the NVIDIA Jetson AGX Orin Series datasheet (v1.2),
I could confirm the following specs of Jetson AGX Orin 32GB.

  • Max Operating Frequency: 939MHz
  • Tensor Core num : 56 Cores
  • Tensor core performance: 54 FP16 TFLOPS
  • Sparsity: fine grained structured sparsity doubles throughput.

And, I think 3rd generation tensor core can execute 128 Multiply-add per cycle.

image
Tensor コア: HPC & AI の多様性 - NVIDIA

So I calculate the tensor core performance as follows.

939 (MHz) * 56 (cores) * 2(Sparsity) * 128 * 2 (multiply-add/cycles) = 26.9 TFLOPS

It doesn’t reach 54 FP16 TFLOPS.
I think I’m overlooking something.

Could you give me any advice on this matter?

Regards,
hiro

(512 FMA ops * 2 * .939 Ghz) * 56 tensor core) = 54 Dense INT8 TOPs * 2 = 108 INT8 TOPs (sp)

FP16 is half of INT8, so 54 FP16 Sparse TFLOPs.

Dear @kayccc

Thank you for your reply.
I checked your reply and I have one question.

I think the result of (512 FMA ops * 2 * .939 Ghz) * 56 tensor core) is around 54 TOPS as follows.

(512 FMA ops * 2 * .939 Ghz) * 56 tensor core) = 53,846.016 TOPS

So I cannot understand why you multiply the result by 2.

(512 FMA ops * 2 * .939 Ghz) * 56 tensor core) = 54 Dense INT8 TOPs * 2

Could you tell me the reason?

Regards,
hiro

Unfortunately, I still don’t understand the detail of Tensor core performance.

If possible, please give me any advice about this calculation.

Regards,
hiro