Hi,
I want to be sure that my GEMM operation done by Tensor Core. By profiling I am seeing this kernel.
By checking the kernel name I am not seeing any thing about Tensor Core. As I remember for the double version it was visable.
I have doubt, because I expected better performance.
I am using V100S GPU with Cuda 11 and I have set:
cublasGemmAlgo_t ALGO = CUBLAS_GEMM_DEFAULT;