https://docs.nvidia.com/cuda/cublas/index.html#cublas-level-3-function-reference
The document said sgemm(m, n, k, A, B, C) means A[m, k] * B[k, N] = C[m, n]
But, in fact it is:
B[n][k] * A[k][m] = C[n][m]
So, it is a little confused. I think it should be fixed.