(this is likely a simple question, but I’ve been head banging with it for a while now)
I have 2 matrixes with the same dimensions and layout, and I want to do A * B^T.
Now, in the matrix multiplication example:
AS(ty, tx) = A[a + wA * ty + tx];
BS(ty, tx) = B[b + wB * ty + tx];
...
Csub += AS(ty, k) * BS(k, tx);
I really can’t see why the only needed change would be more than changing the offset in share memory (apart from offset and step for A and B ) .
int offset = a + wA * ty + tx;
AS(ty, tx) = A[offset];
BS(tx, ty) = B[offset];
...
Csub += AS(ty, k) * BS(k, tx); // still the same
(no, transposing one matrix is not a option, because in fact I have lots of matrices and they’re in texture memory)