Bug in CUDA SDK Sample 'Matrix Multiplication (CUBLAS)'?

appletking · August 1, 2014, 4:15pm

Hey guys,

is it just me or is there a bug in the CUDA SDK Sample ‘Matrix Multiplication (CUBLAS)’ of CUDA v6.0 (CUDA Samples :: CUDA Toolkit Documentation)?

There we want to compute A*B=C, where the dimensions of A, B, and C are defined as multiples of
A: 4x2, B: 4x2, C: 4x2 (see lines 206-211). Obviously the dimensions don’t match at all and I was wondering why this even works.

From my understanding, you have to change these lines to multiples of
A: 4x2, B: 2x4, C: 4x4 (or some other constants, which match the dimension constraints).

In addition we have to change the last argument of the function cublasSgemm (the leading dimension of C) to matrix_size.uiWC. With these changes, everything works fine (also with other constants).

Want do you think? Is there a error in my reasoning? Sorry, if this topic is already discussed elsewhere. I didn’t found any mention of this problem in the web.

Robert_Crovella · August 1, 2014, 7:02pm

It’s not a bug. The sample is confusing.

If you parse out the parameters that are being passed to cublasSgemm on line 285, and study the formula here:

[url]cuBLAS :: CUDA Toolkit Documentation

you’ll understand what is going on. Note the parameters passed and default values:

m: matrix_size.uiWB = 320
n: matrix_size.uiHA = 640
k: matrix_size.uiWA = 320

also note that the matrix labelled d_A is being passed as the “B” matrix in the formula, and the matrix labelled d_B is being passed as the “A” matrix in the formula. Combining these we have:

Formula: C=α op(A)op(B)+β C where: op(A)m× k , op(B)k× n and Cm× n (op() = no transpose here)

Sgemm: C=α d_B(mxk) * d_A(kxn) +β C

So, in effect, only “half” of d_B is being used, since it’s allocated at 320x640 but only 320x320 is being used. Note that something similar is happening in matrixMulCPU, based on the size parameters passed.

As a proof-point, change this line of code:

matrix_size.uiHB = 4 * block_size * iSizeMultiple;

to this:

matrix_size.uiHB = 2 * block_size * iSizeMultiple;

and you will see that the program continues to function normally (with a d_B matrix that is 320x320).

appletking · October 15, 2014, 9:00am

Thanks for the reply. After long time I found time to deal with this problem again. I agree with your explanation.

bnsh · May 10, 2015, 5:59am

I think the only real reason this works, is because uiHB is not actually being used, and 4 is more than is necessary to represent the output C. Proof of this, is that changing uiHB and uiWA to anything else will cause it to fail. So, if you change say, matrix_size.uiHB and matrix_size.uiWA both to 7 say, then it fails. Because, the stride for matrix C should actually be matrix_size.uiWB (or alternatively, and probably more clearly, matrix_size.uiWC)

So, basically change line 272 and 285 from

checkCudaErrors(cublasSgemm(handle, CUBLAS_OP_N, CUBLAS_OP_N, matrix_size.uiWB, matrix_size.uiHA, matrix_size.uiWA, &alpha, d_B, matrix_size.uiWB, d_A, matrix_size.uiWA, &beta, d_C, matrix_size.uiWA));

to

checkCudaErrors(cublasSgemm(handle, CUBLAS_OP_N, CUBLAS_OP_N, matrix_size.uiWB, matrix_size.uiHA, matrix_size.uiWA, &alpha, d_B, matrix_size.uiWB, d_A, matrix_size.uiWA, &beta, d_C, matrix_size.uiWB));

Topic		Replies	Views
Question about cublas demo matrixMulCUBLAS CUDA Programming and Performance	6	2361	May 11, 2015
cublasDgemm returns wrong results for large matrix dimensions? CUDA Programming and Performance	12	3366	November 30, 2010
Some question about the sample matrixMulCUBLAS CUDA Programming and Performance cuda	1	461	April 15, 2020
Matrix multiplication with CublasSgemm CUDA Programming and Performance	3	8860	January 13, 2013
cublasSgemm gives incorrect result with big matrix CUDA Programming and Performance cuda	1	486	June 28, 2020
cublasSgemm gives incorrect result with big matrix CUDA Programming and Performance cuda	0	414	June 26, 2020
Incorrect results from cublasSgemm CUDA Programming and Performance	7	8450	June 25, 2009
CUBLAS issues Some simple question about CUBLAS CUDA Programming and Performance	1	1305	August 22, 2011
CUBLAS matrix multiplication CUDA Programming and Performance	5	4044	December 15, 2012
cuBLAS fails when matrix has more than 2^31-1 entries? CUDA Programming and Performance	12	921	September 14, 2020

Bug in CUDA SDK Sample 'Matrix Multiplication (CUBLAS)'?

Related topics