About Orin SoC Performance

Please provide the following info (tick the boxes after creating this topic):
Software Version
[v] DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other

Target Operating System
[v] Linux
QNX
other

Hardware Platform
[v] DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)
other

SDK Manager Version
[v] 1.9.10816
other

Host Machine Version
[v] native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other

According to this table, the performance of Jetson Drive Orin’s iGPU is 167 TOPS based on INT8 and 5.2 TOPS based on FP32. Here 167 TOPS means the performance of TensorCore ? And can we consider 5.2 TOPS as the performance of CUDA Core? If that is correct, can you explain how you calculated and derived 167 TOPS (64 TensorCores) and 5.2 TOPS (2048 CUDA cores)?

Dear @soohyung.zhang,
INT8 TOPS here indicates Sparse IMMA operations.

FP32/INT8 OPS = GPU clock * SMs * GigaOps per SM.
In this case, it is = 1.275 * 16 * 256 for FP32 and 1.275 * 16 * 8192 for INT8 .

@SivaRamaKrishnaNV Thank you!
As I understand it, the 256 used to calculate the OPS for the FP32 is 128 cuda cores per sm multiplied by the “sparse matrix factor” of 2, right?

But I don’t understand how 8196 for INT8 OPS came about. Can you please explain?

Dear @soohyung.zhang,
As I understand it, the 256 used to calculate the OPS for the FP32 is 128 cuda cores per sm multiplied by the “sparse matrix factor” of 2, right?

Correct. 2 FMA operations per cycle

I don’t understand how 8196 for INT8 OPS came about. Can you please explain?

It is typo. 8192 IMMA sparse operations per SM.

@SivaRamaKrishnaNV Thanks for your kind reply.
Finally, I would like to ask one more fundamental question. Can you tell me why the IMMA per cycle is 32 times the FMA? Are there any effects of accelerators such as tensor core? I don’t see tensor core in your formula.

Dear @soohyung.zhang,
I have mentioned Giga ops per SM in formula.
Each SM has 4 tensor cores. So 8192 gigaops is effect of(combined contribution)tensor cores only.

@SivaRamaKrishnaNV
Thank you for a detailed explanation.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.