Why some fp16 gemm does not utilize Tensor Core

I trained my model with fp16 intending to speed up by using Tensor Core.
I profiled the training to see the utilization of Tensor Core, here’s the result:

I’m wondering why dose kernels like volta_fp16_sgemm_fp16_128x64_tn cannot use Tensor Core, yet volta_fp16_s884gemm_fp16_128x256_ldg8_f2f_tn can. And what does s884 stand for?

Hi @user64956 ,
Apologies for the delay.
Please allow me some time to check on this.
Thank you for your patience.

1 Like