About Orin SoC Performance

soohyung.zhang · November 16, 2022, 3:41am

Please provide the following info (tick the boxes after creating this topic):
Software Version
[v] DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other

Target Operating System
[v] Linux
QNX
other

Hardware Platform
[v] DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)
other

SDK Manager Version
[v] 1.9.10816
other

Host Machine Version
[v] native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other

According to this table, the performance of Jetson Drive Orin’s iGPU is 167 TOPS based on INT8 and 5.2 TOPS based on FP32. Here 167 TOPS means the performance of TensorCore ? And can we consider 5.2 TOPS as the performance of CUDA Core? If that is correct, can you explain how you calculated and derived 167 TOPS (64 TensorCores) and 5.2 TOPS (2048 CUDA cores)?

SivaRamaKrishnaNV · November 16, 2022, 9:50am

Dear @soohyung.zhang,
INT8 TOPS here indicates Sparse IMMA operations.

FP32/INT8 OPS = GPU clock * SMs * GigaOps per SM.
In this case, it is = 1.275 * 16 * 256 for FP32 and 1.275 * 16 * 8192 for INT8 .

soohyung.zhang · November 16, 2022, 10:18am

@SivaRamaKrishnaNV Thank you!
As I understand it, the 256 used to calculate the OPS for the FP32 is 128 cuda cores per sm multiplied by the “sparse matrix factor” of 2, right?

But I don’t understand how 8196 for INT8 OPS came about. Can you please explain?

SivaRamaKrishnaNV · November 16, 2022, 11:13am

Dear @soohyung.zhang,
As I understand it, the 256 used to calculate the OPS for the FP32 is 128 cuda cores per sm multiplied by the “sparse matrix factor” of 2, right?

Correct. 2 FMA operations per cycle

I don’t understand how 8196 for INT8 OPS came about. Can you please explain?

It is typo. 8192 IMMA sparse operations per SM.

soohyung.zhang · November 16, 2022, 12:18pm

@SivaRamaKrishnaNV Thanks for your kind reply.
Finally, I would like to ask one more fundamental question. Can you tell me why the IMMA per cycle is 32 times the FMA? Are there any effects of accelerators such as tensor core? I don’t see tensor core in your formula.

SivaRamaKrishnaNV · November 21, 2022, 2:58am

Dear @soohyung.zhang,
I have mentioned Giga ops per SM in formula.
Each SM has 4 tensor cores. So 8192 gigaops is effect of(combined contribution)tensor cores only.

soohyung.zhang · November 21, 2022, 3:16am

@SivaRamaKrishnaNV
Thank you for a detailed explanation.

system · December 5, 2022, 3:17am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Jetson AGX Orin TOPs / CUDA Cores Explained Jetson AGX Orin jetson-inference	8	6898	May 24, 2023
The tensor core performance detail of Jetson AGX Orin 32GB Jetson AGX Orin	14	1302	June 13, 2023
Jetson series TOPS mean in FLOPS or INTS? Jetson AGX Orin performance	5	6820	November 20, 2023
How to measure 200 TOPS of AI Performance of Orin 32GB? CUDA Programming and Performance jetson	7	3190	July 25, 2023
NVIDIA Orin Performance Jetson AGX Orin tensorrt	3	320	October 14, 2024
Jetson Orin Devkits - TOPS Jetson AGX Orin hw	4	1501	May 23, 2022
The performance of the Jetson Orin Nano module does not match the data provided on the official website Jetson AGX Orin cuda , performance	15	2737	September 28, 2023
Tesla S2050 performance double precision performance too low CUDA Programming and Performance	42	29277	December 8, 2010
How many 16FP Tops for agx-orin 32G Jetson AGX Orin jetson-inference	6	3174	July 13, 2022
Pegasus dGPU 130TOPs comes from? DRIVE AGX Xavier General drive-misc	6	889	December 21, 2022

About Orin SoC Performance

Related topics