GPU stress test for Orin NX

linnus820xd · May 26, 2025, 7:24am

• Hardware Platform (Jetson / GPU)
Jetson Orin NX 16GB
• JetPack Version (valid for Jetson only)
6.2

Hi,

I used the gpu stress test in 2 minutes with my custom carrier board.

https://elinux.org/Jetson/L4T/TRT_Customized_Example#GPU_Stress_Test

The power can achieve 37W (super mode), and the temperature is closed to 80-90C. But I got 0.137 TOPS only, that is far less than the 157 TOPS which the official announced. Could you give me some advice?

Performance= 375417490.63 GFlop/s, Time= 0.000 msec, Size= 137438953472 Ops

AastaLLL · May 26, 2025, 7:33am

Hi,

How do you calculate the GFlop?
For TOPS benchmarking, it’s recommended to try our CUTLASS library.

Thanks.

linnus820xd · May 26, 2025, 8:19am

Hi AastaLLL,

The stress test tool I refer to the topic below.

And the method that the program calculates the GFlop :
GFlop = (2* $Matrix_size *10e-9) / (operation time / 10e-3)

Oh, it seem like Ops is matrix size 240924092*4092 Ops=137438953472 Ops. I misunderstanded the value …
But is the 375417490.63 GFlop/s also too larger right?

linnus820xd · May 27, 2025, 3:33am

Hi AastaLLL,

I found the error of the modified code.
Now, the GFlop is correct (13-15 TOPS) in float16 case.

Could you provide the tool to test GPU stress in int 8?
Does the CUTLASS library can test it ?

AastaLLL · May 28, 2025, 8:36am

Hi,

Yes, you can find the below topic for some info:

Thanks.

linnus820xd · May 29, 2025, 2:22am

Hi AastaLLL,

I tried the method you provided, the steps are:

Changing the below line from Identity8 to Identity4:
cutlass/python/cutlass_library/generator.py at main · NVIDIA/cutlass · GitHub
Build cutlass library
$ git clone GitHub - NVIDIA/cutlass: CUDA Templates for Linear Algebra Subroutines
$ cd cutlass/
$ mkdir build && cd build
$ cmake .. -DCUTLASS_NVCC_ARCHS=87 /
-DCUTLASS_LIBRARY_KERNELS=i16864spgemm
$ make cutlass_profiler -j12
./tools/profiler/cutlass_profiler --gemm_kind=sgemm --m=1024 --n=1024
–k=8192 --A=s8:row --B=s8:column --C=s8:row --E=u32:nk2 --alpha=1
–beta=0 --split_k_slices=1 --batch_count=1 --op_class=tensorop
–accum=s32 --cta_m=256 --cta_n=128 --cta_k=128 --cluster_m=1
–cluster_n=1 --cluster_k=1 --stages=3 --warps_m=4 --warps_n=2
–warps_k=1 --inst_m=16 --inst_n=8 --inst_k=64 --min_cc=80 --max_cc=1024

And I got the result:

The jetson clock is running, and the power mode is already MAXN_SUPER.
But, the performance is still only 37 TOPS, which is significantly below the sparse INT8 performance (100 TOPS) written in the datasheet.
Could you give me some advise?
BTW, my jetson is ORIN NX, not AGX ORIN. Should the parameters be changed for my case?
Thanks !!!

AastaLLL · June 2, 2025, 9:18am

Hi,

To test TOPS, you will need a test that computation >> memory transfer.
So would you mind trying different (k, m, n)?

You can test it with an argument like k=8192:16384:128.

Thanks.

linnus820xd · June 3, 2025, 12:35am

Hi,

I tested k=8192:16384:128, and I got 39 TOPS.

I also saw your response and tried: How to verify Orin the TOPS performance - #8 by AastaLLL

I got the 46909 GFLOP/s = 46.9TOPS in m=512, n=512, k=16256, Identity2.
But, I can’t understand that the SOL is 46.9/60 or 46.9/100 ?
BTW, I don’t know how to test DLA’s TOPS, does NV provide the test tool recently?

AastaLLL · June 4, 2025, 8:44am

Hi,

You will need to use TensorRT to run operations on the DLA.
Please check the below link for more information:

Thanks.

Topic		Replies	Views
How to verify Orin the TOPS performance Jetson Orin NX cuda	10	1117	October 9, 2024
Verifying TOPS with Jetson Orin Nano Jetson Orin NX cudnn	2	131	December 30, 2024
Discrepancy Between Claimed and Actual Sparse INT8 Performance of Tensor Cores on Jetson AGX Orin Jetson AGX Orin tensorrt , performance	15	364	September 11, 2024
The performance of the Jetson Orin Nano module does not match the data provided on the official website Jetson AGX Orin cuda , performance	15	2570	September 28, 2023
How to Limit AI Performance to Specific TOPS on Jetson Orin Nano Developer Kit Jetson Orin Nano jetson-inference	6	77	November 11, 2024
Why I get much higher TFLOPS in Orin AGX than what claimed in the document IGX Developer Kit kernel , jetson-inference , documentation	7	303	November 4, 2024
Jetson AGX Orin TOPs / CUDA Cores Explained Jetson AGX Orin jetson-inference	8	5974	May 24, 2023
NVIDIA Orin Performance Jetson AGX Orin tensorrt	3	237	October 14, 2024
Jetson orin nano fp16/int8 performance Jetson Orin Nano jetson-inference	8	300	March 18, 2025
How to test GPU performance Jetson AGX Orin gpu	2	210	January 10, 2025

GPU stress test for Orin NX

Related topics