Expose NCCL primitives via explicit CUDA graph API

Hello,

Currently I am working with explicit CUDA graph API.
I am curious if CUDA graph has API and any example how to add NCCL communication primitives explicitly to graph.
Official NCCL documentation only mentions CUDA Graph captures API that is not suitable for my needs.

Could you please clarify if explicit CUDA graph creation with NCCL is supported?

What would like to achieve with manual graph node management for NCCL?

Explicit and static definition of CUDA Graph without using streams explicitly.

A captured graph does not contain stream information. You could simply record a nccl graph, then add it to an explicit graph as child graph node. You can access the individual nodes via cudaGraphGetNodes

The point do it without capturing using explicit CUDA graph construction API only.
For example as an input I have static graph annotation info and I want to create CUDA graph explicitly.
Like add nccl primitive kernel wrapper as a node.

The explicit graph definition API is published. I don’t see any intersection with nccl. nccl source is available, so it might be possible to deconstruct a nccl library call and replace it with an explicit graph sequence.

Thank Robert for replay.
However it’s workaround and not the right way to implement it. Sure under the hood NCCL primitive is a CUDA kernel and have clear API in CUDA graph to link NCCL primitive with graph is proper way to engineer it.
So at the conclusion:

  • there is not API to add NCCL kernel to CUDA graph
  • workaround to fetch kernel implementation from NCCL and put it own solution
  • there are not plans to add such functionality in the future release of CUDA graph/NCCL

Is it correct?

I don’t know what that means. I already pointed to the explicit graph API. I don’t see anything there pertaining to nccl. I was just sharing that observation.

I thought the most tangible suggestion here was the one made by striker159

I didn’t say that. NVIDIA generally doesn’t announce plans or forward looking statements here, on this forum. I don’t either (at least I try not to. It’s related to maintaining my employment at NVIDIA.)

If you want to see a change in CUDA APIs, my suggestion would be to file a bug.