I have a question regarding NVIDIA SHARP’s ReduceScatter operation. In general MPI’s ReduceScatter, each process can receive a reduced dataset of different sizes. For example, OpenMPI’s ReduceScatter allows this by specifying const int recvcounts[]
, which defines the amount of data each process should receive.
Reference: OpenMPI MPI_Reduce_scatter
I would like to know whether SHARP supports this feature as well—specifically, can each process receive a different amount of reduced data?
If this is possible, how should I configure sharp_coll_reduce_spec reduce_spec
when calling sharp_coll_do_reduce_scatter
?
I appreciate any guidance on this. Thank you in advance for your help!