matrix multiplication with shared memory (randomly sized) shared memory matrix multiplication random

Hi.

I am a beginner learning CUDA recently, and stuck with the matrix multiplication problem. I’ve been searching posts about matrix multiplication, but I wasn’t able to get the right help. (maybe I should search more, though… :unsure:)

In the SDK example of matrix multiplication with shared memory, a matrix size is fixed like the following source code.

[codebox]// Thread block size

#define BLOCK_SIZE 16

// Matrix dimensions

// (chosen as multiples of the thread block size for simplicity)

#define WA (3 * BLOCK_SIZE) // Matrix A width

#define HA (5 * BLOCK_SIZE) // Matrix A height

#define WB (8 * BLOCK_SIZE) // Matrix B width

#define HB WA // Matrix B height

#define WC WB // Matrix C width

#define HC HA // Matrix C height[/codebox]

The thing is how I am going to handle the randomly sized matrices. There are two cases I can come up with. First, say, I choose 16x16 block size for shared memory, but randomly sized matrices turned out to be 5x3 and 3x8. In this case, BLOCK_SIZE is larger than matrices. Second, there could be the case that matrices are not multiples of BLOCK_SIZE. Then, threads might end up accessing to the wrong address. (right? :unsure: )

In sum,

  1. How to choose the right size of BLOCK_SIZE at run time? (matrices are randomly sized, so they could be a retangule, not a square)

  2. How to handle the case that matrices are not multiples of BLOCK_SIZE?

I really appreciate for your help and time in advance!