Does CUBLAS SGEMM work with tensor cores yet?

LukeCuda · February 26, 2020, 3:12am

Older documentation mentioned if you want to use tensor cores for matrix multiplication you need to use CUTLASS. Its been 3 years since V100 so just wondering if NVIDIA have updated CUBLAS SGEMM/HGEMM to support tensor cores?

Robert_Crovella · February 26, 2020, 10:05am

If you want to use tensor cores, there are various functions within CUBLAS that can use them. HGEMM is one of them. SGEMM is not. More information is available in the CUBLAS documentation.

LukeCuda · February 26, 2020, 10:55am

on a 32k matrix,

V100 HGEMM 2 seconds
P100 HGEMM 4 seconds

V100 SGEMM 4 seconds
P100 SGEMM 8 seconds

are these expected relative times for working tensor cores? I thought they would make more of a difference here.

Robert_Crovella · February 26, 2020, 4:23pm

No they are not (for V100, they are expected ratios for P100). Did you set the math mode?

https://docs.nvidia.com/cuda/cublas/index.html#cublassetmathmode