(this is likely a simple question, but I’ve been head banging with it for a while now)
I have 2 matrixes with the same dimensions and layout, and I want to do A * B^T.
Now, in the matrix multiplication example:
AS(ty, tx) = A[a + wA * ty + tx]; BS(ty, tx) = B[b + wB * ty + tx]; ... Csub += AS(ty, k) * BS(k, tx);
I really can’t see why the only needed change would be more than changing the offset in share memory (apart from offset and step for A and B ) .
int offset = a + wA * ty + tx; AS(ty, tx) = A[offset]; BS(tx, ty) = B[offset]; ... Csub += AS(ty, k) * BS(k, tx); // still the same
(no, transposing one matrix is not a option, because in fact I have lots of matrices and they’re in texture memory)