Hi

I an having trouble handling 2D matrices that have dimensions that are not a multiple of the block size.

My setup code looks like this:

[codebox] int wBlocks = P.width/BLOCK_SIZE + (P.width%BLOCK_SIZE ? 1 : 0);

```
int hBlocks = P.height/BLOCK_SIZE + (P.height%BLOCK_SIZE ? 1 : 0);
```

dim3 threads(BLOCK_SIZE, BLOCK_SIZE);

```
dim3 grid(wBlocks, hBlocks);
```

MatrixMulKernel<<<grid, threads >>>(Md, Nd, Pd);[/codebox]

And I have tried all kinds of stuff in my kernel to avoid the extra threads from computing or doing anything bad. Like this:

[codebox]

```
int idx = blockIdx.x * blockDim.x + threadIdx.x;
int idy = blockIdx.y * blockDim.y + threadIdx.y;
```

if (idx < P.width && idy < P.height)

```
{
// Compute stuff
}[/codebox]
```

I’m not so sure what to do at this point. All the documentation I’ve found glosses over this subject, or doesn’t address it at all (subject of multiplying matrices that have dimensions that are not a multiple of the block size).