Matrix Multiplication row,row-wise

Hi All
I am trying to do multiplication between a 51205120 matrix and a 105120 matrix - the latter has 10 vectors each of size 5120.
So the obvious confusion that will arise here is wrt blocks to step through, since the multiplication is now NOT between row and column. It is instead between row and row. This is because the CPU side operation has been done this way. So to make results coincide, I have to do a row,row matrix,vector multiplication.
I get a ‘launch timed out’ error.
My grid dimensions are 512,512 and block dimensions are 10*10
Pls tell me what I am doing wrong here. I have attached my kernel.

Thanks (6.84 KB)

Basically you need “if” statements for the corner cases. AKA you run the code as if both matrices were of the right size.

You just never commit to memory the threads that are junk. Keep in mind that all other threads are halted if they branch. The ones that follow the same branching path are all executed together.

That is why you need to make a call to sync threads to make sure all threads are completed before you go on.

I’ve attached my kernel that handles the issue you are having. It is based on the SDK code, but modified to handle the issue you mentioned.

The code has been tested. Just pay attention to what my if statements do. (5.44 KB)