CUDA streams related question: order of execution of tasks from two streams

Supposed i have seven streams set up in one host process.

If Stream 4 enqueued a task, say an H2D copy, before Stream 6 enqueued one of its H2D copy tasks, is the task from Stream 4 guaranteed to be executed by the H2D GPU engine before the H2D task that was enqueued by Stream 6?

Both streams enqueued H2D tasks, so these tasks will be both handled by the same GPU engine.

My understanding is that tasks within a stream are executed in the order they are enqueued (FIFO order). However, tasks in different streams are not guaranteed a strict global order of execution with respect to each other, even if they target the same engine, like H2D. So, even if Stream 4 enqueued its H2D copy before Stream 6 did, it’s not guaranteed that Stream 4’s H2D task will be executed before Stream 6’s H2D task. The tasks in different streams can be executed in a different order than they were enqueued, unless explicit synchronization is used.

So, in summary, there’s no strict guarantee of order between tasks in different streams, even if they are enqueued to the same engine.

Is this right?

Correct. The first 2 streams semantics rules are:

  1. Items issued into the same stream execute in issue order.
  2. Items issued into separate created streams have ordering prescribed by CUDA.

Issuing to the same or different engines does not change anything as far as rule 2 goes.

Awesome, that settles it, thank you, Robert.