Ensuring the execution of GEMM done by Tensor Core


I want to be sure that my GEMM operation done by Tensor Core. By profiling I am seeing this kernel.

By checking the kernel name I am not seeing any thing about Tensor Core. As I remember for the double version it was visable.

I have doubt, because I expected better performance.

I am using V100S GPU with Cuda 11 and I have set:


that is a tensor core kernel

you can also use the profiler (e.g. Nsight Compute) to verify tensor core activity.