Does cp.async.bulk Have a Limitation on the Number of CTAs?

For the cp.async.bulk instruction (from shared memory to global memory), we know it accumulates values from the shared memory (SMEM) of some CTAs into global memory. Is there any limitation on the number of CTAs involved?