The following page describes “Tensor Core of Ampere Architecture supports FP64, TF32, bfloat16, FP16, INT8, INT4 and INT1 and doesn’t support FP32.” in the table.
On the other hands, the following page describes “Tensor Core supports FP32” as follows.
The third generation of tensor cores introduced in the NVIDIA Ampere architecture provides a huge performance boost and delivers new precisions to cover the full spectrum required from research to production — FP32, Tensor Float 32 (TF32), FP16, INT8, INT4 and bfloat16.
I understand that 3rd Tensor Cores supports Tensor Float32(TF32).
My question is “Does 3rd Tensor Cores support IEEE 754 single precision floating point number?”
In the above document, there is a link connecting to the GA102 which contains more details about the 3rd generation Tensor Core.
It can take FP32 as input and output but use TF32 intermediate for acceleration.
NVIDIA Ampere Architecture Tensor Cores Support New DL Data Types … Today, the default math for AI training is FP32, without Tensor Core acceleration. The NVIDIA Ampere architecture introduces new support for TF32, enabling AI training to use Tensor Cores
by default with no effort on the user’s part. Non-tensor operations continue to use the FP32 datapath, while TF32 Tensor Cores read FP32 data and use the same range as FP32 with
reduced internal precision, before producing a standard IEEE FP32 output. TF32 includes an 8-bit exponent (same as FP32), 10-bit mantissa (same precision as FP16) and 1 sign-bit. TF32
mode of an Ampere architecture GPU Tensor Core provides up to 4x more throughput than standard FP32 when sparsity is used. Throughput is dependent on modes and SKU information;
see Table 2, Table 3, and Appendix A for per-SKU specifications.