Matrix Multiply anyone can help me ?

hey there,

i am trying to put my own code , about matrix multiply , as i understand i can

execute my kernel on many threads that are organized in block , thus i am traying now to multiply 8X8 matrixes by 1 grid which will have 64 thread

as i understand that threads will be organized automaticly in x,y coordinates

so if i wrote that code it will do the multiplication:

int tx = threadIdx.x;

int ty = threadIdx.y;


assume that i have calculated C currectly.

but that code didn’t work ! , what is the problem with that ?