I’ve tried to call cublas_zgemm and compared the F-norm difference of result from zgemm (blas). It is strange that if I don’t transpose the first input matrix, the F-norm difference is nearly zero. But if i transpose it, says the first parameter of the routine is ‘C’, there will be some difference, thogth very small (in the case of 2000*2000 matrix, the difference is 1e-9). Can anyone explain this? Many thanks.

More information, this phenomenon doesn’t occur for all dimension. For example, dimension 2000 works well. But dimension 2418 shows the difference.

silly me… the relative error is 1e-16, then it is perfect…