Meta_group_rank behavior in Cooperative groups

From the CUDA programming guide the meta_group_rank API in cooperative groups seems to indicate the value would be the linear index of the subgroup in the parent group. However I found that the behavior is different for binary_partition , labeled_partition and tiled_partition. I also found that meta_group_rank < meta_group_size isn’t guaranteed.

It seems that
binary_partition assigns the predicate as rank
labeled_partition assigns the parent’s starting thread rank of the subgroup to be the group rank
tiled_partition assigns linear index in the parent group

Please give an example when meta_group_rank >= meta_group_size.

Everything else sounds exactly as it should be, doesn’t it ?

unsigned long long meta_group_size() const: Returns the number of groups created when the parent group was partitioned.

unsigned long long meta_group_rank() const: Linear rank of the group within the set of tiles partitioned from a parent group (bounded by meta_group_size)

When 4 subgroups are created, meta_group_size should be 4. meta_group_rank should be 0,1,2, or 3. it’s not specified how the subgroups are ranked.