cudaGraph Stream Capture

I have a question about using cudaStreamBegingCapture. According to page 14 at this presentation with title “CONVERT CUDA STREAM INTO A GRAPH”, we can construct a cuda graph from two streams, stream 1 and stream 2, such that kernel B and kernel C can potentially run in parallel.

However, don’t we need to cudaStreamBeginCapture on stream2? The cuda document says only kernels under the captured stream are not executed immediately. For stream 1, it is true, but what about stream 2?

After beginning the capture on stream 1, you record an event on stream 1 and then wait on that event in stream 2.
This will cause stream 2 to join the capture.
After recording all the work that you want to capture on stream 2, you then need to record an event in stream 2 and wait on it in stream 1 before ending the capture on stream 1.