hi all,
i have a question about data teiling in the Matrix muliplication SDK.
The Code for data tiling from device memory to shared memory is following:
//code from SDK
for (int a = aBegin, b = bBegin;a <= aEnd;a += aStep, b += bStep) {
shared float As[BLOCK_SIZE][BLOCK_SIZE]; //Block_size = 3
shared float Bs[BLOCK_SIZE][BLOCK_SIZE];
AS(ty, tx) = A[a + wA * ty + tx];
BS(ty, tx) = B[b + wB * ty + tx];
__syncthreads();
....
}
In the example the same amount tiling data from both matrix, A and B, is copied. Now I want to copy for example a 33 Block from A, but 55 Block from B each time. The both tilings have the same centre.
How can i execute now the threads? I’m some deluded with the threadIDs…
This is a figure about the question: