I want to know the performance detail of Tensor core on Jetson AGX Orin 32GB.
From the NVIDIA Jetson AGX Orin Series datasheet (v1.2),
I could confirm the following specs of Jetson AGX Orin 32GB.
- Max Operating Frequency: 939MHz
- Tensor Core num : 56 Cores
- Tensor core performance: 54 FP16 TFLOPS
- Sparsity: fine grained structured sparsity doubles throughput.
And, I think 3rd generation tensor core can execute 128 Multiply-add per cycle.
Tensor コア: HPC & AI の多様性 - NVIDIA
So I calculate the tensor core performance as follows.
939 (MHz) * 56 (cores) * 2(Sparsity) * 128 * 2 (multiply-add/cycles) = 26.9 TFLOPS
It doesn’t reach 54 FP16 TFLOPS.
I think I’m overlooking something.
Could you give me any advice on this matter?
Regards,
hiro