No, not true. A GEMM operation is not the same as “AI training”. The automatic conversion of FP32 ops to TF32 ops takes place in the confines of a framework like Tensorflow or Pytorch, and then only for certain operations (“AI training”).
I am seeing that A100 has TF32 Tensor core, so if the conversion is not automaticaly I should be abel to replace the SGEMM with equivlant TF32 GEMM from cuBLAS(maybe cublasMath_t=CUBLAS_TF32_TENSOR_OP_MATH). Is it correct? But maybe I can not convert FP32 matrix to TF32 as athe input of TF32 GEMM?!
Note that the inputs are already expected to be FP32. There is no storage difference between TF32 and FP32. The only difference is in interpretation of mantissa bits.