Is cublasHgemm pure half multiplication?

nokanaran · January 10, 2023, 10:09pm

I am trying to find more information about cublasHgemm, but it seems that we do not have it anymore. What is the equivalent to cublasHgemm w.r.t cublasGemmEx?

Robert_Crovella · January 10, 2023, 10:43pm

cublasHgemm() still exists, see here.
cublasSgemmEx() can also handle half multiplication, see here you would select CUDA_R_16F for the matrix types, but the calculation is still done as float.
To emulate cublasHgemm() in cublasGemmEx() (see here) you would use COMPUTE_R_16F for compute type, and CUDA_R_16F for Scale Type, Atype, Btype, and Ctype.

nokanaran · January 10, 2023, 10:56pm

Thanks.
Is there a difference in their performance? I expect them to be the same, at least for multiplication and acummulation in half.

Robert_Crovella · January 10, 2023, 11:00pm

I wouldn’t expect a significant difference between cublasHgemm() and cublasGemmEx() using CUDA_R_16F, but I haven’t tested it.

system · January 24, 2023, 11:01pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
why cublasHgemm is slower more than cublasSgemm when I use? GPU-Accelerated Libraries	6	4420	January 22, 2019
cublasGemmEx result is always zero CUDA Programming and Performance cuda	3	737	October 12, 2021
why is cublasHgemm is slower than cublasSgemm when matrix is low dimension GPU-Accelerated Libraries	0	489	January 22, 2019
cublasHgemm is slower than cublasSgemm in CUDA 11.1 when I use? GPU-Accelerated Libraries	2	553	December 1, 2020
Cublas basics CUDA Programming and Performance	0	388	June 26, 2020
How multiply a matrix and vector GPU-Accelerated Libraries cublas	0	481	November 11, 2023
Why does cublasSgemm uses `f16` for `float`? GPU-Accelerated Libraries cublas	7	1463	March 8, 2023
How does cublasGemmEx() call work with CUDA_R_16F inputs and CUDA_R_32F computeType CUDA Programming and Performance	3	1957	December 10, 2017
SGEMM FP16 compute? CUDA Programming and Performance	6	3944	December 4, 2016
Adapt FP32 operation with TF32 GPU-Accelerated Libraries cublas	4	731	October 7, 2021

Is cublasHgemm pure half multiplication?

Related topics