Does Tensor Core on Jetson AGX Orin support FP32( IEEE 754 single precision floating point number)?

Hiromitsu.Matsuura · March 29, 2023, 12:54pm

The following page describes “Tensor Core of Ampere Architecture supports FP64, TF32, bfloat16, FP16, INT8, INT4 and INT1 and doesn’t support FP32.” in the table.

view-source:Tensor Cores: Versatility for HPC & AI | NVIDIA The Most Powerful End-to-End AI and HPC Data Center Platform

On the other hands, the following page describes “Tensor Core supports FP32” as follows.

The third generation of tensor cores introduced in the NVIDIA Ampere architecture provides a huge performance boost and delivers new precisions to cover the full spectrum required from research to production — FP32, Tensor Float 32 (TF32), FP16, INT8, INT4 and bfloat16.

Discover How Tensor Cores Accelerate Your Mixed Precision Models

So I want to confirm whether tensor core supports FP32(IEEE 754 single precision floating point number) or not.

Regards,
hiro

AastaLLL · March 30, 2023, 2:57am

Hi,

Orin contains the 3rd Tensor Core.
3rd Tensor Cores do have Tensor Float 32 (TF32) support.

You can find details in the Orin technical report:

Thanks.

Hiromitsu.Matsuura · March 30, 2023, 6:17am

Dear AlastaLLL,

Thank you for your comments.

I understand that 3rd Tensor Cores supports Tensor Float32(TF32).
My question is “Does 3rd Tensor Cores support IEEE 754 single precision floating point number?”

Regards,
hiro

AastaLLL · April 6, 2023, 6:26am

Hi,

In the above document, there is a link connecting to the GA102 which contains more details about the 3rd generation Tensor Core.
It can take FP32 as input and output but use TF32 intermediate for acceleration.

NVIDIA Ampere Architecture Tensor Cores Support New DL Data Types
…
Today, the default math for AI training is FP32, without Tensor Core acceleration. The NVIDIA Ampere architecture introduces new support for TF32, enabling AI training to use Tensor Cores
by default with no effort on the user’s part. Non-tensor operations continue to use the FP32 datapath, while TF32 Tensor Cores read FP32 data and use the same range as FP32 with
reduced internal precision, before producing a standard IEEE FP32 output. TF32 includes an 8-bit exponent (same as FP32), 10-bit mantissa (same precision as FP16) and 1 sign-bit. TF32
mode of an Ampere architecture GPU Tensor Core provides up to 4x more throughput than standard FP32 when sparsity is used. Throughput is dependent on modes and SKU information;
see Table 2, Table 3, and Appendix A for per-SKU specifications.

Thanks.

Hiromitsu.Matsuura · April 6, 2023, 6:35am

Dear AlastaLLL,

Thank you for your information.

I understand the architecture.

Regards,
hiro

system · April 25, 2023, 8:02am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Peak FP32 FLOP/s of AGX Orin Jetson AGX Orin performance	4	324	April 14, 2025
INT4 on Jetson-AGX-Orin or Jetson-Orin-Nano? Jetson AGX Orin gpu-computing	3	811	September 10, 2024
Fp32 precision support on Jetson AGX Orin Jetson Nano benchmarks	2	622	June 4, 2024
Accelerating AI Training with NVIDIA TF32 Tensor Cores Technical Blog	1	602	January 29, 2021
Why tensor cores can't do FP32 arithmetic? CUDA Programming and Performance hw	4	833	December 10, 2024
What happens when I call --noTF32 when using trtexec on the Jetson Orin? Jetson AGX Orin tensorrt	5	1285	June 17, 2022
The tensor core performance detail of Jetson AGX Orin 32GB Jetson AGX Orin	14	1493	June 13, 2023
Models running in Cuda Cores or Tensor Cores TensorRT cudnn , inception	1	174	November 4, 2025
Tensor core of Jetson AGX Orin Jetson AGX Orin documentation	2	668	June 7, 2023
Question about tensor cores performance CUDA Programming and Performance	3	827	October 12, 2021

Does Tensor Core on Jetson AGX Orin support FP32( IEEE 754 single precision floating point number)?

Related topics