From the CUDA programming guide the meta_group_rank
API in cooperative groups seems to indicate the value would be the linear index of the subgroup in the parent group. However I found that the behavior is different for binary_partition
, labeled_partition
and tiled_partition
. I also found that meta_group_rank < meta_group_size
isn’t guaranteed.
It seems that
binary_partition
assigns the predicate as rank
labeled_partition
assigns the parent’s starting thread rank of the subgroup to be the group rank
tiled_partition
assigns linear index in the parent group