This html page announces AI performance of Orin 32GB as 200 TOPS.
I’d like to know how to measure this AI performance.
To get some sort of certification, we need to prepare method to measure some performances. We are interested in TOPS and AI performance.
How can we measure AI performance of Orin 32GB?
Thank you very much in advance!
You might be able to get a better response asking this question on a orin forum.
I’m not 100% certain, (the folks on the Orin forum would probably know) but I believe the “AI performance” referred to on that page would be the throughput of the tensor core (TC) units.
The ampere white paper indicates that an Ampere Tensor Core unit delivers 256 FP16 FMA ops/clk. If we do the math for Orin AGX 32GB using data from that page and the ampere white paper:
930MHz*256*56*2 = 26,664,960 MFLOPS/s = 26.7TFLOPS/s
The “TOPS” indication usually refers to integer modes (vs. “TFLOPS”). The 2 integer modes for TC operation covered in the Ampere white paper are INT8 and INT4.
INT8 provides a doubling of the FP16 perf. INT4 provides a doubling of the INT8 perf. Sparsity adds another doubling. If you multiply 26.7 by 8 you get approximately 200 TOPS/s.
So then the question would be how to measure. For a TC throughput measurement, I would always suggest using CUBLAS. Choose a properly constructed CUBLAS INT8 GEMM operation, and you should be able to witness a number between 25 and 50 TOPS/s (somewhere in the 50-100% efficiency range), I would guess (I haven’t done it myself, so there may be things I don’t know). Beyond that, I’m not sure how to witness INT4 performance. It may be exposed in some libraries like CUDNN. I don’t know if it is exposed in CUBLAS or not. To measure INT4 throughput you might need to use CUTLASS. And I don’t know if CUTLASS can do INT4 + sparsity.
Finally is the question of “AI performance”. The most careful measurement of it that I know of is MLPerf. At least on the AI inference side, there are some measurements of performance for various AI workloads on Orin submitted by NVIDIA in the 3.0 (most recent) Inference round.
Thank you for the kind and professional answer!
I firstly saw the calculation equation of the TOPS!
There are arricles I noticed .
From your explanation with those articles above, I think TOPS is such a thing that people like me cannot measure of. It’s like cpu clock frequency. We think higher frequency of CPU would bring better performance. But in real, It’s not. There exist many other players that brake up the performance of programs.
And you mentioned about the MLperf.
Thank you very much!
Hello @Robert_Crovella ,
I unchecked your answer as the solution of this question.
We still need the method to measure the TOPS value for orin device.
So, later sometime if anyone can provide this information, we would welcome and appreciate it.
Thank you very much!
Here is a possible recipe. That would get you int8 non-sparsity TOPS.
To go to int4, I would suggest CUTLASS. I don’t have a specific example to point to, but there may be one in the cutlass test cases. And as I said before, I don’t know about int4+sparsity.
Hi @Robert_Crovella ,
I will share your answer with our company people with this one also.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.