Hi, I used cublasGemmEx to test int8 Performance with CUBLAS_GEMM_DEFAULT_TENSOR_OP algo. Also I call cublasSetMathMode to CUBLAS_TENSOR_OP_MATH.
1.matrix A: (8192x8192), CUBLAS_OP_N, CUDA_R_8I.
2.matrix B: (8192x8192), CUBLAS_OP_T, CUDA_R_8I.
3.matrix C: (8192x8192), , CUDA_R_32I.
4.compute mode: CUDA_R_32I.
5.alpha,beta: (1.0, 0.0).
But the program only reached 5.4TOPS, not 22TOPS as the NVIDIA xavier specs.
I think maybe cublasGemmEx(Int8) donnot use tensor core. The number of int8 performance 5.4TOPS is likely to be 4 times of Int32(1.3TOPS, 1.5TOPS according to specs, 1.377GHz512Core2 Ops= 1.4TOPS).
So how to test int8 performance on tensor core?