i am trying to put my own code , about matrix multiply , as i understand i can
execute my kernel on many threads that are organized in block , thus i am traying now to multiply 8X8 matrixes by 1 grid which will have 64 thread
as i understand that threads will be organized automaticly in x,y coordinates
so if i wrote that code it will do the multiplication:
int tx = threadIdx.x; int ty = threadIdx.y; result[tx*8+ty]=C;
assume that i have calculated C currectly.
but that code didn’t work ! , what is the problem with that ?