Wait for completion of any stream?

grabner · July 28, 2009, 11:49pm

Hi!

In an application consisting of several concurrent kernels in separate streams (e.g., A and B), is it possible to wait for the first stream to complete (i.e., block the calling host thread until either A or B complete)?
cudaStreamSynchronize() does this only for a single stream, is there anything like a UNIX select() on a set of streams in CUDA?

Thanks & kind regards,
Markus

jack · July 29, 2009, 12:02am

I don’t know if it’s the most efficient way, but you could just have a while(true) loop where it calls cuStreamQuery() on each stream and when one of them is true, break out of the loop. You’ll want to add a call to sleep() in there as well (so you don’t peg the processor with your loop), but you’ll have to try different values to see what works best, depending on how long your kernels run.

grabner · July 29, 2009, 2:50pm

Yes, I already considered this, but it’s not really elegant and - as you noted - probably also inefficient in many cases, so I was hoping for a cleaner solution. However, after some more research I think I need a totally different approach anyway. I tried to invoke several kernels in parallel within the same CUDA context and thought streams could do that (the CUDA 2.3 programming guide says in section 3.2.6.1 “Different streams … may execute their commands out of order with respect to one another or concurrently”). On the other hand, “sdk/C/src/simpleStreams/readme.txt” in the CUDA SDK says “Kernels are serialized”. After some experimenting I found out that the latter is true (I wished this would have been clarified in the manual as well).

It looks like command queues in OpenCL could solve the problem (i.e., make the “wait for any”-primitive obsolete), so let me reformulate my question: is there anything like OpenCL command queues in CUDA, or any way to achieve similar behaviour (or maybe planned for a future CUDA release)?

Thanks & kind regards,

Markus

tmurray · July 29, 2009, 6:47pm

What exactly are you trying to accomplish by waiting for completion of any kernel?

grabner · July 29, 2009, 9:59pm

I tried to build a task dependency graph (similar to the command queue in OpenCL). My first attempt was to do all the synchronization on the host, using separate host threads for concurrent paths in the graph. This failed since device memory pointers can’t be shared across host threads. When trying to accomplish the same thing in a single host thread, there must be a way to find out when any kernel has completed because at this point in time a new task is ready for execution (i.e., a new kernel should be launched). This again failed since kernel calls are serialized even if they occur in different streams :-(

So my conclusion for now is to investigate OpenCL where a dependency graph should work out-of-the-box by constructing an appropriate command queue. Nevertheless, I’m curious if such a feature will be included in future CUDA versions as well (or if it is even in CUDA-2.3 and I overlooked it, or at least can be emulated).

Thanks & kind regards,

Markus

Topic		Replies	Views
Get rid of busy waiting during asynchronous cuda stream executions CUDA Programming and Performance	7	2893	March 15, 2011
Coordinating convergence of multiple streams CUDA Programming and Performance	0	2673	June 15, 2008
About the behavior of cudaStreamSynchronize() CUDA Programming and Performance cuda	3	4283	April 25, 2023
Is it possible to make the host thread wait for one of a set of streams to finish, or for one of a s CUDA Programming and Performance	3	1327	December 1, 2014
Async start kernel in different stream after another completes? CUDA Programming and Performance	2	667	April 4, 2016
Running several streams asynchronously CUDA Programming and Performance	3	587	November 24, 2018
Asynchronous Stream Synchronize CUDA Programming and Performance	2	1247	June 30, 2012
Synchronization between streams CUDA Programming and Performance	1	611	December 13, 2017
I want to synchronize CUDA streams CUDA Programming and Performance	5	1024	January 5, 2024
Threading and streams cudaStreamSynchronize CUDA Programming and Performance	4	3692	July 16, 2008

Wait for completion of *any* stream?

Related topics

Wait for completion of any stream?