Hi all, it is a clarification I want to have when I want to use the api cudaStreamSynchronize() while I invoke the cudaGraphLaunch()
Assume that I have a loop like below in which I want to launch the cuda graph as below
for(i = 0; i < 100; i++ ) {
cudaGraphLaunch()
}
Do we I need to use cudaStreamSynchronize() within the loop or wait until the loop completes and invoke it outside loop like below
for(i = 0; i < 100; i++ ) {
cudaGraphLaunch()
}
cudaStreamSynchronize()
Please clarify whether the above approach is correct
OR
do I need to write it immediately after cudaGraphLaunch() like below
cudaStreamSynchronize in the end indicating each cuda Graph might run in parallel.
Running cudaStreamSynchronize in the loop will force the cuda Graph to run in order.
You can also control this via cuda stream.
The task launched by the same stream is guaranteed to be executed in order.
So you will only need to synchronize the stream at the end of the loop.