I want to know the performance detail of Tensor core on Jetson AGX Orin 32GB.
From the NVIDIA Jetson AGX Orin Series datasheet (v1.2),
I could confirm the following specs of Jetson AGX Orin 32GB.
Max Operating Frequency: 939MHz
Tensor Core num : 56 Cores
Tensor core performance: 54 FP16 TFLOPS
Sparsity: fine grained structured sparsity doubles throughput.
And, I think 3rd generation tensor core can execute 128 Multiply-add per cycle.
Tensor コア: HPC & AI の多様性 - NVIDIA
So I calculate the tensor core performance as follows.
939 (MHz) * 56 (cores) * 2(Sparsity) * 128 * 2 (multiply-add/cycles) = 26.9 TFLOPS
It doesn’t reach 54 FP16 TFLOPS .
I think I’m overlooking something.
Could you give me any advice on this matter?
Regards,
hiro
(512 FMA ops * 2 * .939 Ghz) * 56 tensor core) = 54 Dense INT8 TOPs * 2 = 108 INT8 TOPs (sp)
FP16 is half of INT8, so 54 FP16 Sparse TFLOPs.
Dear @kayccc
Thank you for your reply.
I checked your reply and I have one question.
I think the result of (512 FMA ops * 2 * .939 Ghz) * 56 tensor core) is around 54 TOPS as follows.
(512 FMA ops * 2 * .939 Ghz) * 56 tensor core) = 53,846.016 TOPS
So I cannot understand why you multiply the result by 2.
(512 FMA ops * 2 * .939 Ghz) * 56 tensor core) = 54 Dense INT8 TOPs * 2
Could you tell me the reason?
Regards,
hiro
Unfortunately, I still don’t understand the detail of Tensor core performance.
If possible, please give me any advice about this calculation.
Regards,
hiro
Dear @kayccc ,
I think your following calculation is not correct because (512 FMA ops * 2 * .939 Ghz) * 56 tensor core) = 53,846.016 TOPS.
So I want to reconfirm the calculation of Tensor Core performance.
Regards,
hiro
kayccc
June 6, 2023, 12:03am
7
Hiromitsu.Matsuura:
So I cannot understand why you multiply the result by 2.
(512 FMA ops * 2 * .939 Ghz) * 56 tensor core) = 54 Dense INT8 TOPs * 2
Could you tell me the reason?
The 2 is to convert FMA to OPs. Each FMA is a floating point multiply + floating point ADD (two FP ops).
Dear @kayccc ,
Thank you for your reply.
I understood final 2 means multiply-add/cycles.
(512 FMA ops * 2 * .939 Ghz) * 56 tensor core) = 54 Dense INT8 TOPs * 2 (multiply-add/cycles)
If so, could you tell me “512 FMA ops * 2”?
(512 FMA ops * 2 * .939 Ghz) * 56 tensor core) = 54 Dense INT8 TOPs * 2 (multiply-add/cycles)
From following NVIDIA AMPERE GA102 GPU ARCHITECTURE, I think
Jetson’s tensor Core has 512 Sparse INT8 FMA / Core because it has 256 FP16 FMA / Core.
NVIDIA AMPERE GA102 GPU ARCHITECTURE
(https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf )
If so, what 2 means in “512 FMA ops * 2”?
I think it doesn’t mean 2 (multiply-add/cycles).
And it doesn’t mean 2 (Sparse).
Regards,
hiro
kayccc
June 6, 2023, 12:56am
9
The first 2 is for the Multiply-Add/Cycles, and the second 2 is to convert from Dense to Sparse TOPs.
Dear @kayccc ,
Thank you for your quick reply,
The first 2 is for the Multiply-Add/Cycles, and the second 2 is to convert from Dense to Sparse TOPs.
Do you mean “512 FMA ops” is “512 DENSE INT8 FMA”?
I thought Jetson’s tensor Core has 512 Sparse INT8 FMA as follows.
Is this wrong?
Jetson’s tensor Core has 512 Sparse INT8 FMA / Core because it has 256 Sparse FP16 FMA / Core.
Please reconfirm the Jetson’s tensor core spec.
Regards,
hiro
Dear @kayccc ,
Could I ask whether “512 FMA ops” is “512 DENSE INT8 FMA” or “512 Sparse INT8 FMA”?
Regards,
hiro
kayccc
June 12, 2023, 11:49pm
12
The architecture for the Ampere GPU in Orin follows 512 Dense INT8 FMA Ops.
Dear @kayccc ,
Thank you for your reply.
The architecture for the Ampere GPU in Orin follows 512 Dense INT8 FMA Ops.
You mean the architecture for the Ampere GPU in Orin follows 256 Dense FP16 FMA Ops because FP16 is half of INT8.
And the Orin’s tensor core architecture is GA100 SM type?
I heard Orin’s tensor core architecture is GA100 SM type in the following post.
Tensor core of Jetson AGX Orin - Jetson & Embedded Systems / Jetson AGX Orin - NVIDIA Developer Forums
Could I confirm Orin’s tensor core architecture?
Regards,
hiro
kayccc
June 13, 2023, 12:45am
14
The GA10 version that Orin uses is different than the GA10x that is in the doc you referenced, it is a custom GA10 architecture not listed in the document pointed to that has 256 FP16 FMA Ops which is half of INT8, and 512 Dense INT8 FMA Ops.
Dear @kayccc ,
Thank you for your reply.
I understood the architecture.
Regards,
hiro
system
Closed
June 27, 2023, 3:28am
17
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.