Hi, I’m new to CUDA 4.2 programming and I am trying simple Matrix Multiplication using cublas.

I have 2 Matrices A[HA x WA] and B[HB x WB]. I’m storing the results in C[HC x WC]. I’m running the cublas library and verifying the results against regular matrix multiplication on host.

cublasSgemm_v2(handle,CUBLAS_OP_T, CUBLAS_OP_T, HA, WB, WA, &alpha, d_A, HA, d_B, HB, &beta, d_C, WB);

Case 1: Square Matrix A[N x N] x B[N x N]

The output is Transposed but otherwise, the result is correct. How to get the output without the Transpose?

Case 2: Rectangle Matrix A[N x M] x B[M x N]

Error: ** On entry to SGEMM parameter number 8 had an illegal value

cublasSgemm returned error code 7, line(377)

Why does this error occur?

Case 3: Square Matrix A[N x N] x B[N x N] with CUBLAS_OP_N

The results are incorrect.

Why can’t I use CUBLAS_OP_N option?

Thanks in advance.