How to use cublasSgemm ?

Hi, I have done some modify at simpleCublas.c in NVidia SdK examples.

I need to compare results of cublasSgemm to a serial version of matrix-matrix product.

The example provided from sdk accepts only square matrix in input while I need of any type of matrix.

Well above are showed my input matrix and output from cublasSgemm and CPU version of product:

L=6 M=3 N=3

A(LxM) B(MxN) C(LxN)

host results are correct, maybe I wrong cublasSgemm call :

cublasSgemm('n', 'n', L, N, M, alpha, d_A,L , d_B, M, beta, d_C, L);

in this topic I read that cublasSgemm use culomn-major, is it important ?

Cublas uses column major order. Your data is clearly not in column major order. You could either write the data into column major data, or try computing the gemm call with the transpose option if you are certain the transpose will be the same as the column major version of the same data.

I make column major my CPU matrix product and now all work fine.

thank you :)

Can you send me cpu version code.