The tensor core performance detail of Jetson AGX Orin 32GB

Hiromitsu.Matsuura · May 17, 2023, 4:06am

I want to know the performance detail of Tensor core on Jetson AGX Orin 32GB.

From the NVIDIA Jetson AGX Orin Series datasheet (v1.2),
I could confirm the following specs of Jetson AGX Orin 32GB.

Max Operating Frequency: 939MHz
Tensor Core num : 56 Cores
Tensor core performance: 54 FP16 TFLOPS
Sparsity: fine grained structured sparsity doubles throughput.

And, I think 3rd generation tensor core can execute 128 Multiply-add per cycle.

Tensor コア: HPC & AI の多様性 - NVIDIA

So I calculate the tensor core performance as follows.

939 (MHz) * 56 (cores) * 2(Sparsity) * 128 * 2 (multiply-add/cycles) = 26.9 TFLOPS

It doesn’t reach 54 FP16 TFLOPS.
I think I’m overlooking something.

Could you give me any advice on this matter?

Regards,
hiro

kayccc · May 18, 2023, 3:24am

(512 FMA ops * 2 * .939 Ghz) * 56 tensor core) = 54 Dense INT8 TOPs * 2 = 108 INT8 TOPs (sp)

FP16 is half of INT8, so 54 FP16 Sparse TFLOPs.

Hiromitsu.Matsuura · May 18, 2023, 3:58am

Dear @kayccc

Thank you for your reply.
I checked your reply and I have one question.

I think the result of (512 FMA ops * 2 * .939 Ghz) * 56 tensor core) is around 54 TOPS as follows.

(512 FMA ops * 2 * .939 Ghz) * 56 tensor core) = 53,846.016 TOPS

So I cannot understand why you multiply the result by 2.

(512 FMA ops * 2 * .939 Ghz) * 56 tensor core) = 54 Dense INT8 TOPs * 2

Could you tell me the reason?

Regards,
hiro

Hiromitsu.Matsuura · May 25, 2023, 10:17am

Unfortunately, I still don’t understand the detail of Tensor core performance.

If possible, please give me any advice about this calculation.

Regards,
hiro

Hiromitsu.Matsuura · June 5, 2023, 8:39am

Dear @kayccc,

I think your following calculation is not correct because (512 FMA ops * 2 * .939 Ghz) * 56 tensor core) = 53,846.016 TOPS.

So I want to reconfirm the calculation of Tensor Core performance.

Regards,
hiro

kayccc · June 6, 2023, 12:03am

The 2 is to convert FMA to OPs. Each FMA is a floating point multiply + floating point ADD (two FP ops).

Hiromitsu.Matsuura · June 6, 2023, 12:23am

Dear @kayccc,

Thank you for your reply.
I understood final 2 means multiply-add/cycles.

(512 FMA ops * 2 * .939 Ghz) * 56 tensor core) = 54 Dense INT8 TOPs * 2 (multiply-add/cycles)

If so, could you tell me “512 FMA ops * 2”?

(512 FMA ops * 2 * .939 Ghz) * 56 tensor core) = 54 Dense INT8 TOPs * 2 (multiply-add/cycles)

From following NVIDIA AMPERE GA102 GPU ARCHITECTURE, I think
Jetson’s tensor Core has 512 Sparse INT8 FMA / Core because it has 256 FP16 FMA / Core.

NVIDIA AMPERE GA102 GPU ARCHITECTURE
(https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf)

If so, what 2 means in “512 FMA ops * 2”?
I think it doesn’t mean 2 (multiply-add/cycles).
And it doesn’t mean 2 (Sparse).

Regards,
hiro

kayccc · June 6, 2023, 12:56am

The first 2 is for the Multiply-Add/Cycles, and the second 2 is to convert from Dense to Sparse TOPs.

Hiromitsu.Matsuura · June 6, 2023, 1:10am

Dear @kayccc ,

Thank you for your quick reply,

The first 2 is for the Multiply-Add/Cycles, and the second 2 is to convert from Dense to Sparse TOPs.

Do you mean “512 FMA ops” is “512 DENSE INT8 FMA”?

I thought Jetson’s tensor Core has 512 Sparse INT8 FMA as follows.
Is this wrong?

Jetson’s tensor Core has 512 Sparse INT8 FMA / Core because it has 256 Sparse FP16 FMA / Core.

Please reconfirm the Jetson’s tensor core spec.

Regards,
hiro

Hiromitsu.Matsuura · June 12, 2023, 2:00am

Dear @kayccc,

Could I ask whether “512 FMA ops” is “512 DENSE INT8 FMA” or “512 Sparse INT8 FMA”?

Regards,
hiro

kayccc · June 12, 2023, 11:49pm

The architecture for the Ampere GPU in Orin follows 512 Dense INT8 FMA Ops.

Hiromitsu.Matsuura · June 13, 2023, 12:22am

Dear @kayccc,

Thank you for your reply.

The architecture for the Ampere GPU in Orin follows 512 Dense INT8 FMA Ops.

You mean the architecture for the Ampere GPU in Orin follows 256 Dense FP16 FMA Ops because FP16 is half of INT8.
And the Orin’s tensor core architecture is GA100 SM type?

I heard Orin’s tensor core architecture is GA100 SM type in the following post.

Tensor core of Jetson AGX Orin - Jetson & Embedded Systems / Jetson AGX Orin - NVIDIA Developer Forums

Could I confirm Orin’s tensor core architecture?

Regards,
hiro

kayccc · June 13, 2023, 12:45am

The GA10 version that Orin uses is different than the GA10x that is in the doc you referenced, it is a custom GA10 architecture not listed in the document pointed to that has 256 FP16 FMA Ops which is half of INT8, and 512 Dense INT8 FMA Ops.

Hiromitsu.Matsuura · June 13, 2023, 12:50am

Dear @kayccc,

Thank you for your reply.
I understood the architecture.

Regards,
hiro

Topic		Replies	Views
Jetson AGX Orin TOPs / CUDA Cores Explained Jetson AGX Orin jetson-inference	8	7667	May 24, 2023
About Orin SoC Performance DRIVE AGX Orin General drive-docs	7	1458	November 21, 2022
Tensor core of Jetson AGX Orin Jetson AGX Orin documentation	2	678	June 7, 2023
NVIDIA Orin Performance Jetson AGX Orin tensorrt	3	485	October 14, 2024
Whether TFLOPS on Tensor Cores is Sparse TFLOPS or DENSE TFLOPS? Jetson AGX Orin documentation , kb	2	1026	April 6, 2023
What's the spec of GA10b? How to calculate the FP16 computing capability of the CUDA cores of Orin? Jetson AGX Orin gpu , gpu-computing	6	783	September 9, 2024
Jetson series TOPS mean in FLOPS or INTS? Jetson AGX Orin performance	5	7471	November 20, 2023
INT4 on Jetson-AGX-Orin or Jetson-Orin-Nano? Jetson AGX Orin gpu-computing	3	861	September 10, 2024
Difference in TensorCore/FMA and MFU/ALU Performance on Orin Compared to 3080 Jetson AGX Orin tensorrt	5	383	October 22, 2024
Discrepancy Between Claimed and Actual Sparse INT8 Performance of Tensor Cores on Jetson AGX Orin Jetson AGX Orin tensorrt , performance	15	862	September 11, 2024

The tensor core performance detail of Jetson AGX Orin 32GB

Related topics