Multistream in cudagraph capturing

ftatp5901 · February 6, 2025, 8:45am

Hi, there!

I figured out that capturing a CudaGraph including multiple streams is not easy.

import torch

device = "cuda"

stream1 = torch.cuda.Stream()

graph = torch.cuda.CUDAGraph()

a = torch.randn(1024, 1024, device=device)
b = torch.randn(1024, 1024, device=device)
c = torch.empty(1024, 1024, device=device)

_ = torch.matmul(a, b)

with torch.cuda.graph(graph):
    for _ in range(32):
        a.copy_(torch.matmul(a, torch.randn(1024, 1024, device=device)))
        with torch.cuda.stream(stream1):
            b.copy_(torch.matmul(b, torch.randn(1024, 1024, device=device)))
        c.copy_(a + b)

graph.replay()

torch.cuda.synchronize()

The point is that there are 2 different kernels totally independent, so running each of them on different streams can overlap the arithmetrics of the two, eventually shrinking the latency. After running the different kernels, the results of them (a and b) must be summed up, making Tensor c.

However when I run the code, it pops out an error like this :

b.copy_(torch.matmul(b, torch.randn(1024, 1024, device=device)))
RuntimeError: CUDA error: operation not permitted when stream is capturing

This shows that running stream 1 during capturing is not available, so I tested this code after erasing the line “with torch.cuda.stream(stream1):”, and it worked successfully.

Is there no possibility to capture/run a cudagraph with multiple streams?

Robert_Crovella · February 6, 2025, 3:34pm

Here is a recent discussion that may be of interest. I wouldn’t be able to comment on pytorch. I usually suggest that folks asking pytorch questions may get better help on a pytorch forum such as discuss.pytorch.org. There are NVIDIA experts that patrol that forum.

Topic		Replies	Views
Multi-stream graph CUDA Programming and Performance	3	117	February 5, 2025
cudaGraph Stream Capture CUDA Programming and Performance cuda	1	601	August 15, 2023
My streams are not running concurrently CUDA Programming and Performance	7	1775	March 6, 2018
a problem about stream (concurrent copy and execute) CUDA Programming and Performance	0	3389	August 10, 2010
Problem using streams Can't get more than one stream to work CUDA Programming and Performance	3	4663	October 8, 2008
cudaGraphicsUnmapResources and concurrent copy and execute CUDA Programming and Performance	2	7382	September 7, 2011
What will happen when I replay a cuda graph with two streams in a new stream? CUDA Programming and Performance	9	355	May 24, 2024
CUDA Graph capture - work on separated streams invalidates graph capture CUDA Programming and Performance	4	68	February 28, 2025
Asynchronous multi streaming: not working... CUDA Programming and Performance	2	516	May 13, 2018
Streams and CPU CUDA Programming and Performance	1	1029	September 27, 2013

Multistream in cudagraph capturing

Related topics