How to block a single host thread on a CUDA event multi-threaded CUDA application

RoofTopG · September 2, 2011, 7:12pm

Suppose we have a multi-threaded application using the GPU.

Is there a way to block only a single host thread until a specific CUDA/OpenCL event is recorded?

Here’s an example:

CPU Thread T2:

- wait until event EX is recorded in stream SY

- do some T2 work

(CPU Threads T0, T1, T3, ... running)

tmurray · September 2, 2011, 7:28pm

cudaEventSynchronize on an event created with the blocking sync flag works best

RoofTopG · September 2, 2011, 8:10pm

Thanks! How can I make it block on multiple events (order of events unknown)?

e.g. T2 waits for these three events before proceeding: event EX in stream SX, event EY in SY, and event EZ in SZ.

tmurray · September 2, 2011, 9:37pm

there’s no wait for multiple events, but in general you can just call that in order and eventually it will work. however, keep in mind that to avoid deadlock we require that the event be recorded before you try to query or synchronize on it.

RoofTopG · September 8, 2011, 11:31am

Hm, how to ensure this condition in a multi-threaded application? The event will be recorded from one thread and captured from another. What I actually need is some synchronization mechanism between threads on a completion of some GPU work, such as a kernel execution or an async memcpy. My idea was to have something like this:

T1:

 ...some CPU work

 issue some GPU work in stream SX

 issue CUDA eventX in SX

 ...

T2:

 issue some GPU work in stream SY

 issue CUDA eventY in SY

T3:

 cudaEventSynchronize(eventX) 

 cudaEventSynchronize(eventY) 

 counterEventX++

 counterEventY++

 ...some CPU or GPU work

How to go about it?

RoofTopG · September 8, 2011, 11:33am

p.s. Is there a possibility to capture every occurence of an event EX? Or is there something like a fast atomic increment of a host variable from a CUDA stream? I’m looking for a way to implement a semaphore that signals when some (asynchronously issued) GPU work has completed.

Topic		Replies	Views
Threading and streams cudaStreamSynchronize CUDA Programming and Performance	4	3691	July 16, 2008
Synchronize streams on multiple GPUs without blocking the CPU CUDA Programming and Performance	2	263	May 14, 2024
Threads sharing cuda events CUDA Programming and Performance	8	2273	August 19, 2016
OpenMP thread communication for device operations CUDA Programming and Performance	2	980	April 9, 2014
Question about cudaDeviceScheduleBlockingSync CUDA Programming and Performance	0	485	March 24, 2021
Event Synchronization CUDA Programming and Performance	6	2027	February 8, 2019
Is it possible to make the host thread wait for one of a set of streams to finish, or for one of a s CUDA Programming and Performance	3	1326	December 1, 2014
Cuda syncronize APIs CUDA Programming and Performance	9	169	January 28, 2025
letting the host thread sleep in 2.2? CUDA Programming and Performance	8	4452	July 1, 2009
Why does cudaEventSynchronize block other streams? CUDA Programming and Performance cuda	1	531	February 2, 2023

How to block a single host thread on a CUDA event multi-threaded CUDA application

Related topics