Stream fork-join with timing-disabled events.

OscarAmorosHuguet · January 31, 2017, 1:36pm

Hi!

We did not find information on the NVIDIA documentation about how to proceed with what is explained in that blog: http://cedric-augonnet.com/declaring-dependencies-with-cudastreamwaitevent/

We have an iterative application, where on each iteration we execute the same kernels and memory transfers, for different data.

In some parts of the code, before writing results into CPU, we execute some independent kernels, that can be executed in parallel (if there are resources). Some of them are very small and executing in parallel we skip the wait time between kernels.

To do so, we use a fork join strategy in which we have pre allocated N streams, besides the main one, and N*2 events.

The first N events are used to fork: we synchronize each of the N additional streams with the main stream, before anything enqueued in the additional streams executes.

The second set of N events is used to join: we make sure that anything enqueued in the main stream will not execute until everything in the N additional streams is executed.

We follow the strategy explained in the link above, except for the fact that we preallocate all the events and streams, and reuse them on each iteration.

Is this approach conceptually correct?

We had some cuda errors shown on NSIGHT, if we did the following:

1 record an event “A” on the main stream
2 enqueue a cudaStreamWaitEvent on each of the N streams, always using the same event “A”.

Is this approach incorrect? We have some other parts of the code that do that, and they don’t tirgger any errors on NSIGHT. The main difference is that the event is used with cudaStreamWaitEvent only with two or three streams (max(N)=3). In the first case we explained, N=13.

Thank you!

Topic		Replies	Views
Question about CUDA streams CUDA Programming and Performance	8	743	November 8, 2019
Concurrent kernel timing with cudaEvents CUDA Programming and Performance	1	1923	April 27, 2017
multi-buffering using stream Cuda programming CUDA Programming and Performance	1	1270	August 9, 2011
Some questions regarding concurrency CUDA Programming and Performance	0	913	May 25, 2010
Event Synchronization CUDA Programming and Performance	6	1643	February 8, 2019
CUDA and NPP Misc Issues CUDA Programming and Performance	6	1452	March 28, 2011
Fail to sync the cudaMemcpyAsync using the cudaEvent in two streams CUDA Programming and Performance	4	245	April 1, 2024
Stream execution order in CUDA exercise Teaching and Curriculum Support	1	1236	February 3, 2020
Question on Stream, Connection and Performance CUDA Programming and Performance hw , cuda	6	1233	February 23, 2024
Overlapping CPU and GPU code. CUDA Programming and Performance	6	1598	February 27, 2016

Stream fork-join with timing-disabled events.

Related topics