How to measure 200 TOPS of AI Performance of Orin 32GB?

jhjo · July 6, 2023, 1:14am

Hello,

This html page announces AI performance of Orin 32GB as 200 TOPS.

I’d like to know how to measure this AI performance.
To get some sort of certification, we need to prepare method to measure some performances. We are interested in TOPS and AI performance.

How can we measure AI performance of Orin 32GB?

Thank you very much in advance!

Robert_Crovella · July 6, 2023, 2:25am

You might be able to get a better response asking this question on a orin forum.

Robert_Crovella · July 6, 2023, 2:59am

I’m not 100% certain, (the folks on the Orin forum would probably know) but I believe the “AI performance” referred to on that page would be the throughput of the tensor core (TC) units.

The ampere white paper indicates that an Ampere Tensor Core unit delivers 256 FP16 FMA ops/clk. If we do the math for Orin AGX 32GB using data from that page and the ampere white paper:

930MHz*256*56*2 = 26,664,960 MFLOPS/s = 26.7TFLOPS/s

The “TOPS” indication usually refers to integer modes (vs. “TFLOPS”). The 2 integer modes for TC operation covered in the Ampere white paper are INT8 and INT4.

INT8 provides a doubling of the FP16 perf. INT4 provides a doubling of the INT8 perf. Sparsity adds another doubling. If you multiply 26.7 by 8 you get approximately 200 TOPS/s.

So then the question would be how to measure. For a TC throughput measurement, I would always suggest using CUBLAS. Choose a properly constructed CUBLAS INT8 GEMM operation, and you should be able to witness a number between 25 and 50 TOPS/s (somewhere in the 50-100% efficiency range), I would guess (I haven’t done it myself, so there may be things I don’t know). Beyond that, I’m not sure how to witness INT4 performance. It may be exposed in some libraries like CUDNN. I don’t know if it is exposed in CUBLAS or not. To measure INT4 throughput you might need to use CUTLASS. And I don’t know if CUTLASS can do INT4 + sparsity.

Finally is the question of “AI performance”. The most careful measurement of it that I know of is MLPerf. At least on the AI inference side, there are some measurements of performance for various AI workloads on Orin submitted by NVIDIA in the 3.0 (most recent) Inference round.

jhjo · July 6, 2023, 9:50am

Hi @Robert_Crovella,

Thank you for the kind and professional answer!
I firstly saw the calculation equation of the TOPS!

There are arricles I noticed .

From your explanation with those articles above, I think TOPS is such a thing that people like me cannot measure of. It’s like cpu clock frequency. We think higher frequency of CPU would bring better performance. But in real, It’s not. There exist many other players that brake up the performance of programs.

And you mentioned about the MLperf.

Thank you very much!

jhjo · July 11, 2023, 11:42pm

Hello @Robert_Crovella ,

I unchecked your answer as the solution of this question.

We still need the method to measure the TOPS value for orin device.
So, later sometime if anyone can provide this information, we would welcome and appreciate it.

Thank you very much!

Robert_Crovella · July 11, 2023, 11:52pm

Here is a possible recipe. That would get you int8 non-sparsity TOPS.

To go to int4, I would suggest CUTLASS. I don’t have a specific example to point to, but there may be one in the cutlass test cases. And as I said before, I don’t know about int4+sparsity.

jhjo · July 11, 2023, 11:55pm

Hi @Robert_Crovella ,

Thanks!
I will share your answer with our company people with this one also.
Thank you!

Topic		Replies	Views
How to measure 200 TOPS of AI Performance of Orin 32GB? Jetson AGX Orin jetson-inference	2	951	July 12, 2023
How to verify Orin the TOPS performance Jetson Orin NX cuda	9	2332	September 10, 2024
How to test GPU performance Jetson AGX Orin gpu	1	474	January 10, 2025
Jetson series TOPS mean in FLOPS or INTS? Jetson AGX Orin performance	4	7592	November 20, 2023
AI Performance and CPU & GPU Loading Jetson AGX Orin gpu-computing	1	203	September 16, 2024
About Orin SoC Performance DRIVE AGX Orin General drive-docs	6	1485	November 21, 2022
The performance of the Jetson Orin Nano module does not match the data provided on the official website Jetson AGX Orin cuda , performance	14	3057	September 28, 2023
Question about AI performance Jetson Orin Nano cuda	1	245	December 20, 2024
The tensor core performance detail of Jetson AGX Orin 32GB Jetson AGX Orin	13	1579	June 13, 2023
Jetson AGX Orin TOPs / CUDA Cores Explained Jetson AGX Orin jetson-inference	7	7840	May 11, 2023

How to measure 200 TOPS of AI Performance of Orin 32GB?

Related topics