What is the block launching schedule?

I read these:
https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/gemm/threadblock/threadblock_swizzle.h
https://developer.nvidia.com/blog/optimizing-compute-shaders-for-l2-locality-using-thread-group-id-swizzling/

I do not fully understand how we control the block launching sequence… I guess, maybe the block is launched according to their ID? Like, one wave we can launch 10 blocks, so in first wave, block id 0~9 will be launched, and then we launch 10~19? (Because we are discussing GEMM, no need to consider one block fast one block slow, … right?)

The scheduling order of blocks is unspecified.

You can always assign a logical block id to each threadblock by incrementing an atomic counter. This way, the first block to start would have logical id 0, and so on.