cgemm operation returns wrong result Error in C Code?

b-makes-cuda · August 14, 2009, 3:45pm

Hi!

i just started with cuda and I am very excited about all the stuff I can do. But right now I am stuck. Probably cause I am not a C - Pro. But perhaps one of you guys can help me:

I wanted to implement a complex matrix multiplication and although every cublasStatus return value is fine - i get a totally wrong result. Actually I don’t know what cublasCgemm is calculating there. The results are far away from being comprehensible and still confuses me…

thanks in advance,

bjoern

[codebox]

int main(int argc, char** argv)

{

cuComplex* h_A;

cuComplex* h_B;

cuComplex* h_C;

cuComplex* d_A = 0;

cuComplex* d_B = 0;

cuComplex* d_C = 0;

cuComplex alpha = { 1, 0 };

cuComplex beta = { 0, 0 };

/* Initialize CUBLAS */

cublasInit();

int pRowsA = 6;

int pColsA = 3;

int pRowsB = 3;

int pColsB = 2;

int pRowsC = pRowsA;

int pColsC = pColsB;

/* Allocate host memory for the matrices */

h_A = (cuComplex*)malloc(pRowsA * pColsA * sizeof(h_A[0]));

h_B = (cuComplex*)malloc(pRowsB * pColsB * sizeof(h_B[0]));

h_C = (cuComplex*)malloc(pRowsC * pColsC * sizeof(h_C[0]));

/* setting values */

[…]

int mA_lda = pRowsA;

int mA_ldb = pRowsA;

int mB_lda = pRowsB;

int mB_ldb = pRowsB;

int mC_lda = pRowsC;

int mC_ldb = pRowsC;

/* Allocate device memory for the matrices */

cublasAlloc(pRowsA * pColsA, sizeof(h_A[0]), (void**)&d_A);

cublasAlloc(pRowsB * pColsB, sizeof(h_B[0]), (void**)&d_B);

cublasAlloc(pRowsC * pColsC, sizeof(h_C[0]), (void**)&d_C);

/* Initialize the device matrices with the host matrices */

cublasStatus cpA = cublasSetMatrix(pRowsA, pColsA, sizeof(h_A[0]), h_A, mA_lda, d_A, mA_ldb);

cublasStatus cpB = cublasSetMatrix(pRowsB, pColsB, sizeof(h_B[0]), h_B, mB_lda, d_B, mB_ldb);

cublasStatus cpC = cublasSetMatrix(pRowsC, pColsC, sizeof(h_C[0]), h_C, mC_lda, d_C, mC_ldb);

cublasCgemm(‘n’, ‘n’, pRowsA, pColsA, pColsB, alpha, d_A, pRowsA, d_B, pColsB, beta, d_C, pRowsA);

cublasStatus cgRes = cublasGetError();

cublasStatus cgm = cublasGetMatrix(pRowsC, pColsC, sizeof(h_C[0]), d_C, mC_lda, h_C, mC_lda);

for(int i = 0; i < 12; i++) {

  cuComplex t = h_C[i];

  printf("\n re: %f o,: %f ", t.x, t.y );

}

/* Memory clean up */

free(h_A);

free(h_B);

free(h_C);

cublasFree(d_A);

cublasFree(d_B);

cublasFree(d_C);

/* Shutdown */

cublasShutdown();

return EXIT_SUCCESS;

}[/codebox]

JeremiahPalmer · August 14, 2009, 4:03pm

It looks like your arguments for the cublasCgemm are incorrect. You have

cublasCgemm(‘n’, ‘n’, pRowsA, pColsA, pColsB, alpha, d_A, pRowsA, d_B, pColsB, beta, d_C, pRowsA);

And it should be

cublasCgemm(‘n’, ‘n’, pRowsA, pColsB, pRowsB, alpha, d_A, pRowsA, d_B, pColsB, beta, d_C, pRowsA);

Here’s a handy way to remember what are m, n, and k - the first two integers represent the dimension of C. The third integer is the inner dimension between op(A) and op( B ). So, the way I’d write the statement would be this way:

int inner = pRowsB;
cublasCgemm(‘n’, ‘n’, pRowsC, pColsC, inner, alpha, d_A, pRowsA, d_B, pColsB, beta, d_C, pRowsA);

-JP

YDD · August 14, 2009, 4:05pm

In what way is the result wrong? Does it not match the result from CGEMM on BLAS?

b-makes-cuda · August 14, 2009, 9:38pm

Thanks a lot - i will take another look on monday. Right now I have no access to my cuda-machine and unfortunately the emulation mode quits with an segmentation fault.

I took example3 from http://www.ncsa.illinois.edu/UserInfo/Reso…ml/essl125.html, but w/o adding beta*C. The returning result is total different from what I expected. Unfortunately. I also programmed a ‘manual’ way of matrix multiplication (the old-fashioned way, I learned in school, years ago) by using cuCmul(), which worked perfectly. Therefore the input data seems to be also in the correct format. Hopefully it was only the small error JeremiahPalmer mentioned. Hopefully.

b-makes-cuda · August 17, 2009, 7:57am

Almost.

cublasCgemm(‘n’, ‘n’, pRowsC, pColsC, pRowsB, alpha, d_A, pRowsA, d_B, pRowsB, beta, d_C, pRowsC);

did it. But your hint was the clue! I should have asked more earlier. I still wonder why i haven’t figured out that simple error on my own. Anway - thanks a lot!

bjoern

JeremiahPalmer · August 17, 2009, 10:23pm

Great!

MMB · August 24, 2009, 11:20pm

Hello b-makes-cuda. Did you ever get the CUBLAS-Cgemm to work correctly. If so, would you please post what you did.

Thanks

Malcolm

b-makes-cuda · August 25, 2009, 2:45pm

Yes I did. I only mixed up the parameters. After correcting it (see post #5) CUBLAS-Cgemm worked fine. I can send you the code, if you like to.

bjoern

MMB · August 25, 2009, 3:56pm

Thanks for the offer. my e-mail address is mbibby@gullwings.com.

Thanks again.

Malcolm

Topic		Replies	Views
cublasSgemm gives incorrect result with big matrix CUDA Programming and Performance cuda	1	431	June 28, 2020
cublasSgemm gives incorrect result with big matrix CUDA Programming and Performance cuda	0	378	June 26, 2020
CUBLAS Sgemm Wrong results CUDA Programming and Performance	3	3060	April 1, 2013
cublasSgemm() alway fail during compute intensify task CUDA Programming and Performance	14	4554	January 8, 2015
cublasSgemm wrong return value CUDA Programming and Performance	15	6244	November 12, 2010
Is it correct that my Pascal card is calling Maxwell_gemm kernels through cublas? And if so, why is cublas unusably slow for me? CUDA Programming and Performance	6	940	August 23, 2018
DGEMM parameter number 8 had an illegal value GPU-Accelerated Libraries	7	10057	August 12, 2013
Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaMemsetAsync CUDA Programming and Performance	7	7516	January 11, 2020
Please help for using cublas Zgemm~ CUDA Programming and Performance	4	1733	July 27, 2015
cublasSgemv() returning not expected values CUDA Programming and Performance	1	3128	December 1, 2009

cgemm operation returns wrong result Error in C Code?

Related topics