Hi everyone, its the first time I post here, but im having problems with matrix multiplication on non square matrixes.

In the CUDA examples, if I use the sdk code, its is valid for square matrixes.

For example, using a BLOCK_SIZE of 16, and two matrixes of 3200x3200 elements, the results are correct.

However, whe using the same BLOCK_SIZE and matrixA=3200x1600 and matrixB=1600x3200, I get incorrect results.

Does anyone know why the cuda example doesnt work, and if it were possible, to give an example of a correct matrix multiplication, because I cant get my head around things!

Thanks in advance,

David Lisin