cublasSgemm_v2 returns incorrect Matrix Multiplication results

Hi, I’m new to CUDA 4.2 programming and I am trying simple Matrix Multiplication using cublas.

I have 2 Matrices A[HA x WA] and B[HB x WB]. I’m storing the results in C[HC x WC]. I’m running the cublas library and verifying the results against regular matrix multiplication on host.

cublasSgemm_v2(handle,CUBLAS_OP_T, CUBLAS_OP_T, HA, WB, WA, &alpha, d_A, HA, d_B, HB, &beta, d_C, WB);

Case 1: Square Matrix A[N x N] x B[N x N]
The output is Transposed but otherwise, the result is correct. How to get the output without the Transpose?

Case 2: Rectangle Matrix A[N x M] x B[M x N]
Error: ** On entry to SGEMM parameter number 8 had an illegal value
cublasSgemm returned error code 7, line(377)
Why does this error occur?

Case 3: Square Matrix A[N x N] x B[N x N] with CUBLAS_OP_N
The results are incorrect.
Why can’t I use CUBLAS_OP_N option?

Thanks in advance.

cuBLAS and most other linear algebra libraries use the column-major format, so that is one thing you have to be careful of, especially if you are a C/C++ guy.

When you get an illegal parameter, that usually means you messed up the adjustment for column-major. Look here

http://docs.nvidia.com/cuda/cublas/index.html

for details.