if I write code like below:
dim3 numBlocks(A, (points_num + 128 - 1) / 128);
Can I know the order of threadblocks been scheduled on the SM? Also, what is the reason of block swizzle?
if I write code like below:
dim3 numBlocks(A, (points_num + 128 - 1) / 128);
Can I know the order of threadblocks been scheduled on the SM? Also, what is the reason of block swizzle?
According to some comment in CUB, for a 2D grid the blocks are scheduled with increasing
tile_idx = (blockIdx.x * gridDim.y) + blockIdx.y
AFAIK there is no guarantee of an order.
If you want one, assign the id from an atomic, which you increment for each block.
swizzle?
Like in cutlass, there is a part to swizzle the CTA for better L2 cache hit ratio.
That is not knowing, but expecting and hoping for an order. Slight difference acceptable for performance, but not for correctness.
like in this link