Can SHARP ReduceScatter Handle Different Receive Sizes per Process?

I have a question regarding NVIDIA SHARP’s ReduceScatter operation. In general MPI’s ReduceScatter, each process can receive a reduced dataset of different sizes. For example, OpenMPI’s ReduceScatter allows this by specifying const int recvcounts[], which defines the amount of data each process should receive.

Reference: OpenMPI MPI_Reduce_scatter

I would like to know whether SHARP supports this feature as well—specifically, can each process receive a different amount of reduced data?

If this is possible, how should I configure sharp_coll_reduce_spec reduce_spec when calling sharp_coll_do_reduce_scatter?

I appreciate any guidance on this. Thank you in advance for your help!

Hi
According to what I know, Sharp RS/AG will be supported in Sharp V4.

This means it can only be supported by XDR switch.

Thanks,
Suo