Does CUBLAS SGEMM work with tensor cores yet?

Older documentation mentioned if you want to use tensor cores for matrix multiplication you need to use CUTLASS. Its been 3 years since V100 so just wondering if NVIDIA have updated CUBLAS SGEMM/HGEMM to support tensor cores?

If you want to use tensor cores, there are various functions within CUBLAS that can use them. HGEMM is one of them. SGEMM is not. More information is available in the CUBLAS documentation.

on a 32k matrix,

V100 HGEMM 2 seconds
P100 HGEMM 4 seconds

V100 SGEMM 4 seconds
P100 SGEMM 8 seconds

are these expected relative times for working tensor cores? I thought they would make more of a difference here.

No they are not (for V100, they are expected ratios for P100). Did you set the math mode?

https://docs.nvidia.com/cuda/cublas/index.html#cublassetmathmode