Ensuring the execution of GEMM done by Tensor Core

Hi,

I want to be sure that my GEMM operation done by Tensor Core. By profiling I am seeing this kernel.

By checking the kernel name I am not seeing any thing about Tensor Core. As I remember for the double version it was visable.

I have doubt, because I expected better performance.

I am using V100S GPU with Cuda 11 and I have set:

cublasGemmAlgo_t ALGO = CUBLAS_GEMM_DEFAULT;

that is a tensor core kernel

you can also use the profiler (e.g. Nsight Compute) to verify tensor core activity.