About the behavior of cudaStreamSynchronize()

Hi, I want to ask questions…

If cudaStreamSynchronize() is used, the cpu waits until the cuda process for a particular stream (not all streams) is completed.
Is this correct?

In addition, If each of the multiple cpu threads use a different cuda stream to execute cuda processes, is it possible to synchronize between each cpu thread and cuda stream?

Yes, the CPU thread is blocked on that call until all previously issued CUDA work to that stream is completed.

I don’t really understand the question. CPU threads can synchronize with each other. This is independent of CUDA, and whatever threading model you choose probably has a barrier mechanism for this purpose. Streams generally don’t synchronize CPU threads. It might be possible to use the stream mechanism to “synchronize” cpu threads, but at first glance creating a reliable example would be somewhat involved, and would probably rest on some assumptions.

One stream can be “synchronized” with another using cudaStreamWaitEvent(), but even this will rest on certain ordering assumptions.

Thank you for reply.

I don’t really understand the question.
Sorry for my poor explanation.
I’m now trying to do process like below.
Each cpu threads activate each CUDA stream and each CUDA pipelines is executed on each CUDA stream.
Finally, I want each cpu threads to wait until each CUDA pipelines finishs.
Is this possible?
(Rather than synchronization between multiple cpu threads, for example, I want cpu thread 1 to wait for CUDA pipeline 1 to finish.)

any CPU thread can wait on completion of work issued to any stream using cudaStreamSynchronize(<stream>);

1 Like