From the CUDA programming guide the
meta_group_rank API in cooperative groups seems to indicate the value would be the linear index of the subgroup in the parent group. However I found that the behavior is different for
tiled_partition. I also found that
meta_group_rank < meta_group_size isn’t guaranteed.
It seems that
binary_partition assigns the predicate as rank
labeled_partition assigns the parent’s starting thread rank of the subgroup to be the group rank
tiled_partition assigns linear index in the parent group