Matrix Multiplication

silentlearner · September 13, 2009, 8:21am

I am trying to run matrix multiplication in Matlab 2009a using CUDA. The .cu file I wrote compiled, but it gives the wrong answer. Also, I noticed that whenever I change the block size, the answer differ as well. May I know if I have done any mistake in my code? Thanks in advanced.
matmul.cu.txt (1.69 KB)

kdahm666 · September 13, 2009, 8:30am

The error comes up in this part of your code:

dim3 dimBlock(BLOCK_SIZE,BLOCK_SIZE);

dim3 dimGrid(B.width/dimBlock.x,A.height/dimBlock.y);

With this, your matrixMul will only work if the dimension of the matrix is a multiple of BLOCK_SIZE.

For all other cases, the kernel will fail.

The following should do the rest. (N,M = Dimension of Matrix):

dim3 dimBlock(BLOCK_SIZE,BLOCK_SIZE);

dim3 dimGrid((N + B.width - 1)/dimBlock.x,(M + A.height - 1)/dimBlock.y);

I didn’t test it. Give it a try.

silentlearner · September 13, 2009, 8:49am

The error comes up in this part of your code:
dim3 dimBlock(BLOCK_SIZE,BLOCK_SIZE);

dim3 dimGrid(B.width/dimBlock.x,A.height/dimBlock.y);
With this, your matrixMul will only work if the dimension of the matrix is a multiple of BLOCK_SIZE.

For all other cases, the kernel will fail.

The following should do the rest. (N,M = Dimension of Matrix):
dim3 dimBlock(BLOCK_SIZE,BLOCK_SIZE);

dim3 dimGrid((N + B.width - 1)/dimBlock.x,(M + A.height - 1)/dimBlock.y);
I didn’t test it. Give it a try.

Thanks for your prompt reply. There are three matrix here. I presume that N and M refers to the dimension of the resultant matice? (let say C= A*B, N and M are for the dimension of C?)

EDIT : I tried multiplication for 16*16 matrix (all three matrices are of the same dimension). But still it gives incorrect answer.

kdahm666 · September 13, 2009, 10:51am

You should check for errors after kernel invocation and/or cudaMemcpy:

...

size_t size=A.width*A.height*sizeof(float);

cutilSafeCall(cudaMalloc((void**)&dA.elements,size));

cutilSafeCall(cudaMemcpy(dA.elements,A.elements,size,cudaMemcpyHostToDevice));

size=B.width*B.height*sizeof(float);

cutilSafeCall(cudaMalloc((void**)&dB.elements,size));

cutilSafeCall(cudaMemcpy(dB.elements,B.elements,size,cudaMemcpyHostToDevice));

size=C.width*C.height*sizeof(float);

cutilSafeCall(cudaMalloc((void**)&dC.elements,size));

dim3 dimBlock(BLOCK_SIZE,BLOCK_SIZE);

dim3 dimGrid(B.width/dimBlock.x,A.height/dimBlock.y);

MatMulKernel<<<dimGrid,dimBlock>>>(dA,dB,dC);

cutilCheckMsg("Kernel execution failed");

cutilSafeCall(cudaMemcpy(C.elements,dC.elements,size,cudaMemcpyDeviceToHost));

...

silentlearner · September 13, 2009, 12:29pm

Thanks for your help. I finally realize what’s wrong after some testing. During input, I used matmul(A,B) instead of matmul(single(A),single(B)). Looks like the source code can only handle single precision. I am just wondering why this happen though. I thought CUDA 2.3 support double precision? Or is it necessary to put some changes in the source code to enable the double precision support?

EDIT: I google for this problem. Apparently the line -arch sm_13 need to be added after the end of the COMFLAGS line in nvmexopts.bat to enabled double precision. But then…I add the line, and matlab can’t recognize the line…
EDIT 2: Solution found, just add the line to mexopts.bat file as well.

e.ping · September 17, 2009, 10:57am

not able to download attachment

running linux

Topic		Replies	Views
matrix multiplication--wrong answer CUDA Programming and Performance	6	3812	August 20, 2009
Matrix multiplication---not getting correct answer? answer for matrix multiplicatin seems to be wron CUDA Programming and Performance	0	3207	August 1, 2009
Matrix Multiplication Error CUDA Programming and Performance	0	1482	April 16, 2012
Matrix Mult Result is zero! CUDA Programming and Performance	2	1044	July 11, 2010
Matrix multiplication CUDA Programming and Performance	7	2176	July 2, 2010
Matrix Multiplication Help CUDA Programming and Performance	5	3867	August 19, 2009
Matrix Multiplucation CUDA Programming and Performance	0	667	June 27, 2011
Matrix Multiplication Buggy CUDA Programming and Performance	13	5250	May 5, 2010
32 x 32 Matrix Multiplication CUDA Programming and Performance	2	2894	March 5, 2010
Thread Block Size CUDA Programming and Performance	1	878	September 17, 2009

Matrix Multiplication

Related topics