I am trying to do multiplication between a 51205120 matrix and a 105120 matrix - the latter has 10 vectors each of size 5120.
So the obvious confusion that will arise here is wrt blocks to step through, since the multiplication is now NOT between row and column. It is instead between row and row. This is because the CPU side operation has been done this way. So to make results coincide, I have to do a row,row matrix,vector multiplication.
I get a ‘launch timed out’ error.
My grid dimensions are 512,512 and block dimensions are 10*10
Pls tell me what I am doing wrong here. I have attached my kernel.
kernel.cu (6.84 KB)