The code from matrixMul_kernel.cu of CUDA SDK/projects
Some parts make me confused.
aBegin = wA * BLOCK_SIZE * by;
wA is given as 3*block_size, what does the expression above indicate?
why the blockIdx.y is needed in this part? since the block_size is 16 and C is a (816)(5*16) matrix, a thread compute a sub-matrix of C, each block contains 16 threads,so we need 40 blocks, and then I am confused, how do I know the distribution of the threads between the blocks which computes the sub-matirxes.
Is blockIdx.x some constant value here? because I don’t see it have been defined before.
How could the loop from aBegin to aEnd compute all the sub-matrixes? To me it just loop over a row of the matrix A.
I am just getting started and don’t have a clear feeling of cuda, could someone give help?
// Block index int bx = blockIdx.x; int by = blockIdx.y; // Thread index int tx = threadIdx.x; int ty = threadIdx.y; // Index of the first sub-matrix of A processed by the block int aBegin = wA * BLOCK_SIZE * by; // Index of the last sub-matrix of A processed by the block int aEnd = aBegin + wA - 1; // Step size used to iterate through the sub-matrices of A int aStep = BLOCK_SIZE; // Index of the first sub-matrix of B processed by the block int bBegin = BLOCK_SIZE * bx; // Step size used to iterate through the sub-matrices of B int bStep = BLOCK_SIZE * wB;