I am trying to find more information about cublasHgemm, but it seems that we do not have it anymore. What is the equivalent to cublasHgemm w.r.t cublasGemmEx?
cublasHgemm() still exists, see here.
cublasSgemmEx() can also handle half multiplication, see here you would select
CUDA_R_16F for the matrix types, but the calculation is still done as
cublasGemmEx() (see here) you would use
COMPUTE_R_16F for compute type, and
CUDA_R_16F for Scale Type, Atype, Btype, and Ctype.
Is there a difference in their performance? I expect them to be the same, at least for multiplication and acummulation in half.
I wouldn’t expect a significant difference between
CUDA_R_16F, but I haven’t tested it.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.