CUBLAS Sgemm Wrong results

I am trying to use CUBLASSgemm for matrix multiplication but it gives me wrong result.
I try this in two ways. The first method gives wrong results while the other gives correct but transposed result. Please tell me the right way to use cublasSgemm method to get correct (non-transposed) result for matrix multiplication.
I using it as follows:

cublasSgemm( ‘N’, ‘N’, N, N, N, alpha, d_A, N, d_B, N, beta, d_C, N );

Output:
Matrix A
0.00 3.00 1.00 3.00 1.00 4.00 3.00 0.00
4.00 4.00 4.00 1.00 4.00 3.00 3.00 3.00
0.00 3.00 3.00 3.00 3.00 1.00 2.00 1.00
2.00 1.00 3.00 2.00 3.00 4.00 3.00 4.00
2.00 4.00 2.00 4.00 0.00 0.00 1.00 4.00
1.00 1.00 2.00 2.00 1.00 1.00 0.00 3.00
4.00 0.00 1.00 4.00 1.00 0.00 0.00 0.00
2.00 0.00 4.00 2.00 4.00 2.00 1.00 3.00
Matrix B
1.00 0.00 4.00 1.00 2.00 1.00 0.00 0.00
2.00 3.00 2.00 0.00 1.00 4.00 3.00 2.00
1.00 2.00 1.00 2.00 4.00 3.00 4.00 1.00
3.00 3.00 1.00 4.00 0.00 2.00 2.00 3.00
0.00 4.00 4.00 2.00 0.00 2.00 3.00 4.00
2.00 2.00 4.00 0.00 4.00 4.00 2.00 0.00
1.00 0.00 0.00 1.00 3.00 4.00 4.00 3.00
0.00 0.00 2.00 2.00 0.00 1.00 1.00 0.00
CUBLAS Version = 5000
CUBLAS SGEMM Elapsed Time: 0.000044 s, GFLOPS = 0.023256
Matrix Cublas C
7.00 25.00 22.00 27.00 17.00 13.00 16.00 19.00
34.00 32.00 41.00 43.00 35.00 27.00 22.00 33.00
41.00 35.00 40.00 52.00 29.00 24.00 22.00 43.00
36.00 30.00 48.00 41.00 46.00 46.00 35.00 41.00
42.00 32.00 57.00 44.00 55.00 34.00 30.00 42.00
28.00 46.00 40.00 52.00 28.00 22.00 24.00 38.00
34.00 20.00 34.00 47.00 24.00 18.00 12.00 37.00
9.00 9.00 15.00 16.00 14.00 11.00 10.00 13.00
Matrix Host C
27.00 32.00 30.00 19.00 32.00 51.00 42.00 29.00
28.00 45.00 63.00 33.00 49.00 69.00 63.00 40.00
22.00 38.00 30.00 28.00 25.00 46.00 47.00 36.00
24.00 35.00 51.00 33.00 42.00 57.00 52.00 32.00
25.00 28.00 30.00 31.00 19.00 40.00 36.00 25.00
13.00 19.00 24.00 21.00 15.00 24.00 23.00 14.00
17.00 18.00 25.00 24.00 12.00 17.00 15.00 17.00
17.00 34.00 44.00 33.00 31.00 41.00 43.00 29.00

If I use it as follows then it gives correct results but transposed.

cublasSgemm( ‘T’, ‘T’, N, N, N, alpha, d_A, N, d_B, N, beta, d_C, N );

Output:
Using Matrix Sizes: (8 x 8)

Matrix A
0.00 3.00 1.00 3.00 1.00 4.00 3.00 0.00
4.00 4.00 4.00 1.00 4.00 3.00 3.00 3.00
0.00 3.00 3.00 3.00 3.00 1.00 2.00 1.00
2.00 1.00 3.00 2.00 3.00 4.00 3.00 4.00
2.00 4.00 2.00 4.00 0.00 0.00 1.00 4.00
1.00 1.00 2.00 2.00 1.00 1.00 0.00 3.00
4.00 0.00 1.00 4.00 1.00 0.00 0.00 0.00
2.00 0.00 4.00 2.00 4.00 2.00 1.00 3.00
Matrix B
1.00 0.00 4.00 1.00 2.00 1.00 0.00 0.00
2.00 3.00 2.00 0.00 1.00 4.00 3.00 2.00
1.00 2.00 1.00 2.00 4.00 3.00 4.00 1.00
3.00 3.00 1.00 4.00 0.00 2.00 2.00 3.00
0.00 4.00 4.00 2.00 0.00 2.00 3.00 4.00
2.00 2.00 4.00 0.00 4.00 4.00 2.00 0.00
1.00 0.00 0.00 1.00 3.00 4.00 4.00 3.00
0.00 0.00 2.00 2.00 0.00 1.00 1.00 0.00
CUBLAS Version = 5000
CUBLAS SGEMM Elapsed Time: 0.000035 s, GFLOPS = 0.029331
Matrix Cublas C
27.00 28.00 22.00 24.00 25.00 13.00 17.00 17.00
32.00 45.00 38.00 35.00 28.00 19.00 18.00 34.00
30.00 63.00 30.00 51.00 30.00 24.00 25.00 44.00
19.00 33.00 28.00 33.00 31.00 21.00 24.00 33.00
32.00 49.00 25.00 42.00 19.00 15.00 12.00 31.00
51.00 69.00 46.00 57.00 40.00 24.00 17.00 41.00
42.00 63.00 47.00 52.00 36.00 23.00 15.00 43.00
29.00 40.00 36.00 32.00 25.00 14.00 17.00 29.00
Matrix Host C
27.00 32.00 30.00 19.00 32.00 51.00 42.00 29.00
28.00 45.00 63.00 33.00 49.00 69.00 63.00 40.00
22.00 38.00 30.00 28.00 25.00 46.00 47.00 36.00
24.00 35.00 51.00 33.00 42.00 57.00 52.00 32.00
25.00 28.00 30.00 31.00 19.00 40.00 36.00 25.00
13.00 19.00 24.00 21.00 15.00 24.00 23.00 14.00
17.00 18.00 25.00 24.00 12.00 17.00 15.00 17.00
17.00 34.00 44.00 33.00 31.00 41.00 43.00 29.00

Again this question… Linear Algebra libraries use COLUMN MAJOR format! The results are not wrong, rather the parameters you put into the Sgemm call are wrong.

Read the documentation for cuBLAS, see this thread

https://devtalk.nvidia.com/default/topic/534473/gpu-accelerated-libraries/cublassgemm_v2-returns-incorrect-matrix-multiplication-results/

Thanks for the reply.
OK, i was using it incorrectly. Now, it is working fine if I switch the parameters A and B to be handled has column major.

cublasSgemm( ‘N’, ‘N’, N, N, N, alpha, d_B, N, d_A, N, beta, d_C, N );

Sorry for any inconvenience.

Not a problem, just always look at the documentation for the library. Coming from a comp sci background you probably were expecting row-major.
Getting used to these issues is worth the effort given the speed of *gemm operations.