Regarding where to place the api cudaStreamSynchronize() while looping

Hi all, it is a clarification I want to have when I want to use the api cudaStreamSynchronize() while I invoke the cudaGraphLaunch()

Assume that I have a loop like below in which I want to launch the cuda graph as below
for(i = 0; i < 100; i++ ) {
cudaGraphLaunch()
}

Do we I need to use cudaStreamSynchronize() within the loop or wait until the loop completes and invoke it outside loop like below
for(i = 0; i < 100; i++ ) {
cudaGraphLaunch()
}
cudaStreamSynchronize()

Please clarify whether the above approach is correct

OR

do I need to write it immediately after cudaGraphLaunch() like below

for(i = 0; i < 100; i++ ) {
cudaGraphLaunch()
cudaStreamSynchronize()
}

Please clarify which approach is correct

Thanks and Regards

Nagaraj Trivedi

Hi,

This depends on your use case.

cudaStreamSynchronize in the end indicating each cuda Graph might run in parallel.
Running cudaStreamSynchronize in the loop will force the cuda Graph to run in order.

You can also control this via cuda stream.
The task launched by the same stream is guaranteed to be executed in order.
So you will only need to synchronize the stream at the end of the loop.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.