Is it possible to make the host thread wait for one of a set of streams to finish, or for one of a s

alexgg · November 30, 2014, 5:19am

Let’s say I want to launch task_a and task_b simultaneously (in different streams). As soon as one of them finishes, I want to launch task_c.

I suppose I could have the host thread wake up every once in a while, and call cudaEventQuery to see if any of the events occurred. However, I’m hoping for a more elegant solution. Is there one?

little_jimmy · November 30, 2014, 7:10am

your any qualifier makes it more difficult

your hypothetical options are: events, callbacks, multiple host threads

you can synchronize on stream events you recorded, but i do not see how you would be able to manage the any qualifier with multiple events (cudaEventSynchronize()), with a single host thread
if you know one of the 2 tasks (a or b) executes longer than the other, you may simply record a single event after the shortest task, and also launch the shortest task first, in an attempt to guarantee that it would finish first

callbacks should be able to tell you which stream finishes first, but you are not allowed to issue cuda apis from within callback functions directly/ indirectly; hence you would have to implement a user-defined host event to really make it work; something like the host waits on a host event after issuing a and b, and the callback functions trigger it

lastly, you could have a host thread per task a and b, to issue the task and wait on it; the host threads share a volatile variable, which they then test, to determine who must launch c

Robert_Crovella · November 30, 2014, 2:54pm

One possible approach. I don’t know if it is “more elegant”.

Modify your definition of task_c as follows:

task_c will use a (e.g. pthread) mutex to control access to a global (host) variable. At the start of task_c, it will check the global variable. If it is not set, it will set it and proceed (with the rest of task_c). If it is set, it will exit (and skip the rest of task_c).

Now, in host thread 1, launch task_a into stream A, followed by cudaStreamSynchronize, followed by task_c. In host thread 2, launch task_b into stream B, followed by cudaStreamSynchronize, followed by task_c. Only one will execute task_c, whichever one gets there first. This doesn’t require any explicit spin-polling, nor the use of any events.

And if task_c is just a kernel call (as opposed to a more complicated sequence of cuda calls), it can be done with dynamic parallelism using a method similar to above. Just create a global device variable. Whichever task_c kernel begins first updates the device variable, then does a child launch on the desired task_c kernel.

Both methods are extensible to waiting on one of more than 2 previous tasks to finish.

alexgg · December 1, 2014, 8:18am

Thanks for the suggestions!

Topic		Replies	Views
Wait for completion of any stream? CUDA Programming and Performance	4	4751	July 29, 2009
host streams CUDA Programming and Performance	11	1672	January 2, 2015
Threading and streams cudaStreamSynchronize CUDA Programming and Performance	4	3751	July 16, 2008
How to block a single host thread on a CUDA event multi-threaded CUDA application CUDA Programming and Performance	5	2306	September 8, 2011
Get rid of busy waiting during asynchronous cuda stream executions CUDA Programming and Performance	7	2920	March 15, 2011
Threads sharing cuda events CUDA Programming and Performance	8	2363	August 19, 2016
How to make a kernel's execution wait for a signal from another thread CUDA Programming and Performance	4	183	October 28, 2024
Asynchronous Stream Synchronize CUDA Programming and Performance	2	1264	June 30, 2012
How to hang up a stream waiting for a CPU thread? CUDA Programming and Performance	1	916	September 22, 2015
Signalling between cuda thread and host thread cuda-thread host-thread communication CUDA Programming and Performance	7	19809	June 21, 2011

Is it possible to make the host thread wait for one of a set of streams to finish, or for one of a s

Related topics