Problem with simple Matrix Multiplication using shared memory

hey guys,

i’m working on a matrixmul this week for a project at my uni. The Kernel Code is form a book(Programming Massively Parallel Processors: Hands-on Approach) but there are some problems that i cant fix.

  1. In Matrix P are only caculated WIDTH / TILE_WIDTH rows. ( i thinks so, cause there a only printed these)
  2. With values for width that are above ~512 elapsed time is 0?!

I hope u can help me.
Thanks for ur further help.

Greetz
matrix.cu (3.6 KB)