cublasHgemm() still exists, see here.
cublasSgemmEx() can also handle half multiplication, see here you would select CUDA_R_16F for the matrix types, but the calculation is still done as float.
To emulate cublasHgemm() in cublasGemmEx() (see here) you would use COMPUTE_R_16F for compute type, and CUDA_R_16F for Scale Type, Atype, Btype, and Ctype.
1 Like