CUBLAS, simple product of matrix

I try to do a simple product of matrix … normaly simple with the Cublas library …

But, I have a little problem, I don’t find the correspondance between the documentation and implementation.

To do the product of matrix, i use the followinf function :

  • void

  • cublasSgemm (char transa, char transb, int m, int n, int k, float alpha,

  •          const float *A, int lda, const float *B, int ldb, float beta, 
    
  •          float *C, int ldc)
    
  • C = alpha * op(A) * op(B) + beta * C,

with alpha = 1.0 and beta = 0.0 …

But I have two problems :

In the case of square matrix, I need to exchange the order of matrix to obtain the good result, I use this line

cublasSgemm(‘n’,‘n’,tx,ty,tx,1.0,d_B,tx,d_A,tx,0.0,d_C,tx);

In spite of

cublasSgemm(‘n’,‘n’,tx,ty,tx,1.0,d_A,tx,d_B,tx,0.0,d_C,tx);

My 2nd problem is i cant’ obtain good result if I try to evaluate the product of matrix not square …

I use the Cuda 2_0 beta version, on a Windows Xp System.

What is wrong ?

Thanks

Beleys


 status = cublasAlloc(m1X*m1Y, sizeof(float), (void**)&d_A);

  status = cublasAlloc(m2X*m2Y, sizeof(float), (void**)&d_B);

  status = cublasAlloc(m2Y*m1X, sizeof(float), (void**)&d_C);

 cublasSetMatrix (m1Y, m1X, sizeof(float), fat1, m1X, d_A,m1X);

  cublasSetMatrix (m2X, m2Y, sizeof(float), fat2, m2X, d_B,m2X);

    

  cublasSgemm('N', 'N', m1Y, m2X, m1X, 1, d_B, m1X, d_A, m2X, 0.0, d_C, m2X);

  status = cublasGetError();

 status =  cublasGetMatrix (m1Y, m2X, sizeof(float),d_C, m1Y, fat3, m1Y);

CUBLAS is using Fortran ordering ( column major), if you are calling from C your matrices are in row major ordering.

Yep. Instead of using ‘N’ ‘N’ as the first two parameters you pass in, use ‘t’ ‘t’

cublasSgemm('t', 't', m1Y, m2X, m1X, 1, d_B, m1X, d_A, m2X, 0.0, d_C, m2X);

It should be like that. Also, are you sure the square of a matrix is working? It should be returning the transpose of the result you want, not the actual result.

Thanks for your answer and sorry for my question…

In fact, I computed
C = transpose(B) * transpose(A)
Instead of C = A * B … It’s why I have obtained good result for square matrix … External Image

Thanks a lot

++ Beleys