Hi, I have done some modify at simpleCublas.c in NVidia SdK examples.
I need to compare results of cublasSgemm to a serial version of matrix-matrix product.
The example provided from sdk accepts only square matrix in input while I need of any type of matrix.
Well above are showed my input matrix and output from cublasSgemm and CPU version of product:
L=6 M=3 N=3
A(LxM) B(MxN) C(LxN)
matrix A
0.000000 1.000000 2.000000
3.000000 4.000000 5.000000
6.000000 7.000000 8.000000
9.000000 10.000000 11.000000
12.000000 13.000000 14.000000
15.000000 16.000000 17.000000
matrix B
0.000000 1.000000 2.000000
3.000000 4.000000 5.000000
6.000000 7.000000 8.000000
host result
15.000000 18.000000 21.000000
42.000000 54.000000 66.000000
69.000000 90.000000 111.000000
96.000000 126.000000 156.000000
123.000000 162.000000 201.000000
150.000000 198.000000 246.000000
device result
30.000000 33.000000 36.000000
39.000000 42.000000 45.000000
84.000000 96.000000 108.000000
120.000000 132.000000 144.000000
138.000000 159.000000 180.000000
201.000000 222.000000 243.000000
host results are correct, maybe I wrong cublasSgemm call :
cublasSgemm('n', 'n', L, N, M, alpha, d_A,L , d_B, M, beta, d_C, L);
in this topic I read that cublasSgemm use culomn-major, is it important ?
Cublas uses column major order. Your data is clearly not in column major order. You could either write the data into column major data, or try computing the gemm call with the transpose option if you are certain the transpose will be the same as the column major version of the same data.
1 Like
I make column major my CPU matrix product and now all work fine.
thank you :)
Can you send me cpu version code.