Hello,
I have a question regarding SHARP and NCCL collectives. Is it possible to run multiple streaming aggregations simultaneously? Specifically, I am trying to run Allgather and Reduce-scatter collectives simultaneously using SHARP Streaming Aggregation.
According to the NVIDIA SHARP documentation, SHARP Streaming Aggregation can be executed on a single NCCL communicator/process group:
NCCL SHARP Streaming aggregation is supported on a single NCCL communicator/process group (PG). Applications can selectively enable SHARP on specific Process Group (PG) by setting this variable in the application before creating the PG.
If I run both Allgather and Reduce-Scatter on the same NCCL communicator, is overlapping these operations possible? Any insights or comments would be greatly appreciated!
Thank you!
Yes, because allgather doesn’t use Sharp resource.
Which mean when you run sharp all reduce, you can run any other jobs only with normal NCCL traffic.
Thank you for the clarification.
When you mention that “Allgather does not use SHARP resources,” are you referring to resources related to data reduction, such as the aggregation logic or hardware?
To further clarify my understanding, if I run Allgather and Reduce-Scatter of size S
with N
GPUs simultaneously using SHARP, here is what I expect:
Operation with SHARP |
Data sent by each GPU |
Data received by each GPU |
Allgather |
S |
(N-1)S |
Reduce-Scatter |
(N-1)S |
S |
Overlap (simultaneous) |
NS |
NS |
If these operations overlap, I believe each GPU would need to handle a total data transfer of NS for both sending and receiving simultaneously. My goal is to achieve this. Could this lead to any potential conflicts?
I appreciate your feedback!