Hi
how can I use cudaMemcpyPeerAsync in a graph ?
Regards
Hi
how can I use cudaMemcpyPeerAsync in a graph ?
Regards
did you try stream capture?
I was hoping not going that path, as that would require creating almost a hundred of separate streams to express all the parallelism, unless that’s the only way .
I think using cudaGraphAddMemcpyNode1D
or cudaGraphAddMemcpyNode
should work also if you are using API capture.