Fp32 & a100

uniadam · December 15, 2021, 9:14pm

I am reading whitepaper of A100. In page 26:

The NVIDIA Ampere architecture introduces new support for TF32, enabling AI training to use tensor cores by default with no effort on the user’s part.

So it means that If I have some FP32 GEMM operations in my code (MAGMA_sgetrf) it should be done automaticaly in TF32. Is it true?

If so, I sholuld see Tensor core GEMM in the nsys profile, but I am seeing normal GEMM with FP32.

Robert_Crovella · December 16, 2021, 2:01pm

No, not true. A GEMM operation is not the same as “AI training”. The automatic conversion of FP32 ops to TF32 ops takes place in the confines of a framework like Tensorflow or Pytorch, and then only for certain operations (“AI training”).

uniadam · December 16, 2021, 2:43pm

Thansk Robert,

I am seeing that A100 has TF32 Tensor core, so if the conversion is not automaticaly I should be abel to replace the SGEMM with equivlant TF32 GEMM from cuBLAS(maybe cublasMath_t=CUBLAS_TF32_TENSOR_OP_MATH). Is it correct? But maybe I can not convert FP32 matrix to TF32 as athe input of TF32 GEMM?!

,

Robert_Crovella · December 16, 2021, 2:49pm

cublasGemmEx() (for example) has paths to select TF32 computation:

CUBLAS_COMPUTE_32F_FAST_TF32

CUDA_R_32F

CUDA_R_32F

Note that the inputs are already expected to be FP32. There is no storage difference between TF32 and FP32. The only difference is in interpretation of mantissa bits.

Topic		Replies	Views
Accelerating AI Training with NVIDIA TF32 Tensor Cores Technical Blog	1	620	January 29, 2021
Adapt FP32 operation with TF32 GPU-Accelerated Libraries cublas	4	754	October 7, 2021
Performace on A100SXM40GB TF32 vs FP32 CUDA Programming and Performance cuda , ampere	1	1113	January 26, 2023
[cuBLASDx] TF32 support? GPU-Accelerated Libraries cublas	0	237	May 7, 2024
Accelerating TensorFlow on NVIDIA A100 GPUs Technical Blog	0	563	August 25, 2020
TF32 GEMM sample very slow compared to generic GEMM CUDA Programming and Performance	5	924	June 30, 2022
Peak FP32 FLOP/s of AGX Orin Jetson AGX Orin performance	3	388	April 14, 2025
How to enable Tensor core for cublasSgemmBatched on H100? GPU-Accelerated Libraries cuda , kernel , cublas , cutlass	4	1123	November 3, 2023
Is CUBLAS_GEMM_DEFAULT_TENSOR_OP in cublasGemmEX no longer supported? GPU-Accelerated Libraries cublas , cutensor	3	1517	September 6, 2023
Serving with Pytorch and TF32/FP16 Frameworks (archived) pytorch	0	758	November 3, 2021

Fp32 & a100

Related topics