From what i’ve read from about tensor cores and cublas they only work with matrices with dimension in multiples of 8. Therefore if I have 2 matrices A[18393][663] B[663][40] , and I wish to do matrix matrix multiplication AxB what is the best way to optimize this. Should I pad the matrices with 0 till they are multiples of 8 and use gemmEx()? or should i use wmma to do it part by part .
Also cublasXtGemmEx() does not support tensor cores right?