Fp32 & a100

I am reading whitepaper of A100. In page 26:

The NVIDIA Ampere architecture introduces new support for TF32, enabling AI training to use tensor cores by default with no effort on the user’s part.

So it means that If I have some FP32 GEMM operations in my code (MAGMA_sgetrf) it should be done automaticaly in TF32. Is it true?

If so, I sholuld see Tensor core GEMM in the nsys profile, but I am seeing normal GEMM with FP32.

No, not true. A GEMM operation is not the same as “AI training”. The automatic conversion of FP32 ops to TF32 ops takes place in the confines of a framework like Tensorflow or Pytorch, and then only for certain operations (“AI training”).

1 Like

Thansk Robert,

I am seeing that A100 has TF32 Tensor core, so if the conversion is not automaticaly I should be abel to replace the SGEMM with equivlant TF32 GEMM from cuBLAS(maybe cublasMath_t=CUBLAS_TF32_TENSOR_OP_MATH). Is it correct? But maybe I can not convert FP32 matrix to TF32 as athe input of TF32 GEMM?!

,

cublasGemmEx() (for example) has paths to select TF32 computation:

CUBLAS_COMPUTE_32F_FAST_TF32

CUDA_R_32F

CUDA_R_32F

Note that the inputs are already expected to be FP32. There is no storage difference between TF32 and FP32. The only difference is in interpretation of mantissa bits.