I’m using Cuda 1.1 and
I’m trying the tutorial code on simple matrix multiplication,
to test its performance on my 8600M.
So, when I try to compare the results, it seems that with the simple algorithm I get matrixC = matrixAmatrixB, but with the cublasSgemm I get matrixC = matrixBmatrixA! How it’s possible? (if I change A with B in the simple algorithm, I get the same result as cublas).
The code is the same as in the official programming guide.
The cublasSgemm call is:
cublasSgemm( ‘n’,‘n’, m, n, k, 1.0f , dA, lda, dB, ldb, 0.0f, dC, ldc );