Hi everyone, its the first time I post here, but im having problems with matrix multiplication on non square matrixes.
In the CUDA examples, if I use the sdk code, its is valid for square matrixes.
For example, using a BLOCK_SIZE of 16, and two matrixes of 3200x3200 elements, the results are correct.
However, whe using the same BLOCK_SIZE and matrixA=3200x1600 and matrixB=1600x3200, I get incorrect results.
Does anyone know why the cuda example doesnt work, and if it were possible, to give an example of a correct matrix multiplication, because I cant get my head around things!
Thanks in advance,