How to put cudaMemcpyPeerAsync into a graph?

how can I use cudaMemcpyPeerAsync in a graph ?


did you try stream capture?

I was hoping not going that path, as that would require creating almost a hundred of separate streams to express all the parallelism, unless that’s the only way .

I think using cudaGraphAddMemcpyNode1D or cudaGraphAddMemcpyNode should work also if you are using API capture.