I have a question about using cudaStreamBegingCapture. According to page 14 at this presentation with title “CONVERT CUDA STREAM INTO A GRAPH”, we can construct a cuda graph from two streams, stream 1 and stream 2, such that kernel B and kernel C can potentially run in parallel.
However, don’t we need to cudaStreamBeginCapture on stream2? The cuda document says only kernels under the captured stream are not executed immediately. For stream 1, it is true, but what about stream 2?