Does Tensor Core on Jetson AGX Orin support FP32( IEEE 754 single precision floating point number)?

The following page describes “Tensor Core of Ampere Architecture supports FP64, TF32, bfloat16, FP16, INT8, INT4 and INT1 and doesn’t support FP32.” in the table.

view-source:Tensor Cores: Versatility for HPC & AI | NVIDIAThe Most Powerful End-to-End AI and HPC Data Center Platform

On the other hands, the following page describes “Tensor Core supports FP32” as follows.

The third generation of tensor cores introduced in the NVIDIA Ampere architecture provides a huge performance boost and delivers new precisions to cover the full spectrum required from research to production — FP32, Tensor Float 32 (TF32), FP16, INT8, INT4 and bfloat16.

Discover How Tensor Cores Accelerate Your Mixed Precision Models

So I want to confirm whether tensor core supports FP32(IEEE 754 single precision floating point number) or not.

Regards,
hiro

Hi,

Orin contains the 3rd Tensor Core.
3rd Tensor Cores do have Tensor Float 32 (TF32) support.

You can find details in the Orin technical report:

Thanks.

Dear AlastaLLL,

Thank you for your comments.

I understand that 3rd Tensor Cores supports Tensor Float32(TF32).
My question is “Does 3rd Tensor Cores support IEEE 754 single precision floating point number?”

Regards,
hiro

Hi,

In the above document, there is a link connecting to the GA102 which contains more details about the 3rd generation Tensor Core.
It can take FP32 as input and output but use TF32 intermediate for acceleration.


NVIDIA Ampere Architecture Tensor Cores Support New DL Data Types

Today, the default math for AI training is FP32, without Tensor Core acceleration. The NVIDIA Ampere architecture introduces new support for TF32, enabling AI training to use Tensor Cores
by default with no effort on the user’s part. Non-tensor operations continue to use the FP32 datapath, while TF32 Tensor Cores read FP32 data and use the same range as FP32 with
reduced internal precision, before producing a standard IEEE FP32 output. TF32 includes an 8-bit exponent (same as FP32), 10-bit mantissa (same precision as FP16) and 1 sign-bit. TF32
mode of an Ampere architecture GPU Tensor Core provides up to 4x more throughput than standard FP32 when sparsity is used. Throughput is dependent on modes and SKU information;
see Table 2, Table 3, and Appendix A for per-SKU specifications.


Thanks.

1 Like

Dear AlastaLLL,

Thank you for your information.

I understand the architecture.

Regards,
hiro

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.