# matrix multiplication with some modification

Dear all:

I modified the example code of matrix multiplication in the file “CUDA Fortran Programming Guide and Reference” as below because I tried to apply for the arbitrary dimensions.
It’s seemed something wrong when I tested with two matrices Adev(568,568) and Bdev(568, 2902).
There were always errors larger than 1.E-3…:(
The dimensions of grid and block were

``````dimGrid = dim3( (568-1)/16+1, (2902-1)/16+1, 1 )
dimBlock = dim3( 16, 16, 1 )
``````

How should I modify my code?

Feng

``````    attributes(global) subroutine gpu_cal_coef( Adev, Bdev, Cdev, NB, M, L)
implicit none
integer, value :: NB, M, L
real*8, device :: Adev(NB,M), Bdev(M,L), Cdev(NB,L)
integer, device :: i, j, kb, k, tx, ty
real*8, shared :: Asub(16,16), Bsub(16,16)
real*8, device :: Cij

! Start execution, first get my thread indices

! This thread computes C(i,j) = sum(A(i,:) * B(:,j))
i = (blockidx%x-1)*16 + tx
j = (blockidx%y-1)*16 + ty

Cij = 0.d0

do kb = 1, M, 16
if (i<=NB .and. kb+ty-1<=M)then        !<--modification
else
Asub(tx,ty) = 0.d0                            !<--modification
end if

if (kb+tx-1<=M .and. j<=L)then          !<--modification
Bsub(tx,ty) = Bdev(kb+tx-1,j)
else
Bsub(tx,ty) = 0.d0                            !<--modification
end if

do k = 1,16
Cij = Cij + Asub(tx,k)*Bsub(k,ty)
enddo

enddo
Cdev(i,j) = Cij

end subroutine gpu_cal_coef
``````

Hi Feng,

567/17=33 blocks. 33 block times 16 threads per block is only 528 elements.

Since this is integer division, if the number of elements is not evenly divisible by the number of threads in a block, you need to round up.

``````dimGrid = dim3( (568+15)/16, (2902+15)/16, 1 )
``````

You then need to make sure you have guards which skip the excess threads (which it looks like you have).

• Mat

Hi, Mat

Thank you for reminding.
The number of block is (567/16)+1 = 36. There are 576 elements larger than 567.
I got no idea what happened. :(

Feng

Hi Mat,

I found the Cdev(i,j)=Cij should be guarded too. That is:

``````if( i<=NB .and. j<=L )then
Cdev(i,j)=Cij
end if
``````

All the errors are less than 1.E-6.

Feng