Peak FP32 FLOP/s of AGX Orin

virtual.ramblings · April 10, 2025, 8:34am

I am trying to get the peak theoretical FP32 FLOP/s of the Jetson AGX Orin.

The documentation mentions 170 INT8 Sparse TOP/s, from which I get 85 INT8 Dense TOP/s, and then 42 FP16 FLOP/s and 21 FP32 FLOP/s. So is the peak performance 21 FLOP/s for FP32?

Even if I use FP32 matmuls in TensorRT or PyTorch, this will automatically be executed using tensor cores because they support TF32 right?

Because I see another source mentioning 5.3 FP32 FLOP/s for CUDA cores, and I want to know which of these comes into play for deep learning workloads. Thanks!

AastaLLL · April 11, 2025, 2:40am

Hi,

Tensor core requires INT8 and FP16 precision.
You can find some details below:

You can also try our CUTLASS library to run a benchmark.

Thanks.

virtual.ramblings · April 11, 2025, 4:11am

Accelerating AI Training with NVIDIA TF32 Tensor Cores | NVIDIA Technical Blog.

This says “TF32 mode is the default option for AI training with 32-bit variables on Ampere GPU architecture. It brings Tensor Core acceleration to single-precision DL workloads, without needing any changes to model scripts.”

Doesn’t this mean that FP32 DL code will run on tensor cores?

AastaLLL · April 14, 2025, 5:26am

Hi,

Sorry for the missing, TF32 can work on Tensor Core.
But please note that TF32 and FP32 are different.

For example, you can find below the document for TensorRT:
As TensorRT chooses algorithms based on resources and performance, there is no guarantee that a layer will run on Tensor Core.

Thanks.

system · May 6, 2025, 2:26am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Why I get much higher TFLOPS in Orin AGX than what claimed in the document IGX Developer Kit kernel , jetson-inference , documentation	7	303	November 4, 2024
TFLOPS(FP16) about DLA (Deep Learning Accelerator) on Jetson Orin NX Jetson AGX Orin dla , kb	4	1864	April 13, 2023
What is the peak FP32 performance of Orin AGX? Jetson AGX Orin documentation	2	163	November 1, 2024
Jetson AGX Orin TOPs / CUDA Cores Explained Jetson AGX Orin jetson-inference	8	5971	May 24, 2023
INT4 on Jetson-AGX-Orin or Jetson-Orin-Nano? Jetson AGX Orin gpu-computing	3	353	September 10, 2024
Jetson series TOPS mean in FLOPS or INTS? Jetson AGX Orin performance	5	6217	November 20, 2023
Does Tensor Core on Jetson AGX Orin support FP32( IEEE 754 single precision floating point number)? Jetson AGX Orin tensorrt , kb	5	1665	April 25, 2023
NVIDIA Orin Performance Jetson AGX Orin tensorrt	3	233	October 14, 2024
The tensor core performance detail of Jetson AGX Orin 32GB Jetson AGX Orin	14	1133	June 13, 2023
About Orin SoC Performance DRIVE AGX Orin General drive-docs	7	1283	November 21, 2022

Peak FP32 FLOP/s of AGX Orin

Related topics