I’m using Cuda 1.1 and

I’m trying the tutorial code on simple matrix multiplication,

to test its performance on my 8600M.

So, when I try to compare the results, it seems that with the simple algorithm I get matrixC = matrixA*matrixB, but with the cublasSgemm I get matrixC = matrixB*matrixA! How it’s possible? (if I change A with B in the simple algorithm, I get the same result as cublas).

The code is the same as in the official programming guide.

The cublasSgemm call is:

cublasSgemm( ‘n’,‘n’, m, n, k, 1.0f , dA, lda, dB, ldb, 0.0f, dC, ldc );

Thank you.