Adapt FP32 operation with TF32

uniadam · October 6, 2021, 4:52pm

I have 3 matrix A, B & C. A & B are in Half data type and B is Single. The operation is like this:
A = A + BxC.

Now I am using “cublasGemmEx” for this operation, but I want to do this operation in Half and TF32. Can I do this operation without converting from FP32 to TF32? Do we have any replacment for “cublasGemmEx” which accept FP32 and do GEMM in TF32?

I am not sure if “cublasGemmEx” is doing the same now or not!

Regards,
N.S.

mnicely · October 6, 2021, 8:40pm

If I understand your question, you want to do compute in FP16 or TF32? According to the documentation A, B, and C must be be FP32.

uniadam · October 7, 2021, 5:57pm

I want to to do a GEMM with TF32, my data is in FP32.

Robert_Crovella · October 7, 2021, 8:33pm

?

uniadam · October 7, 2021, 10:21pm

it was my mistake sorry.

Originaly A,B & C are FP32. but for doing faster GEMM I am converting A&B to half and storing final in FP32 with cublasGemmEx.

But I thinking about doing this GEMM a bit little faster. Maybe TF 32 for all matrixes or some other combination.

Topic		Replies	Views
Fp32 & a100 GPU-Accelerated Libraries cublas	3	775	December 16, 2021
why cublasHgemm is slower more than cublasSgemm when I use? GPU-Accelerated Libraries	6	4344	January 22, 2019
How does cublasGemmEx() call work with CUDA_R_16F inputs and CUDA_R_32F computeType CUDA Programming and Performance	3	1886	December 10, 2017
cublasGemmEx result is always zero CUDA Programming and Performance cuda	3	677	October 12, 2021
faster sgemm when transA and transB are 't' and 't CUDA Programming and Performance	7	4840	July 29, 2008
SGEMM FP16 compute? CUDA Programming and Performance	6	3855	December 4, 2016
fp32 sgemm and fp16 hgemm CUDA Programming and Performance	0	1769	July 4, 2016
cublasHgemm did not faster than cublasSgemm on 2080Ti GPU-Accelerated Libraries cuda	2	576	September 14, 2020
Matlab mex file using cublas - problems CUDA Programming and Performance	13	8992	October 13, 2009
why is cublasHgemm is slower than cublasSgemm when matrix is low dimension GPU-Accelerated Libraries	0	466	January 22, 2019

Adapt FP32 operation with TF32

Related topics