Yes, it is safe. If we ignore the possibility to modify default stream behavior then with respect to a particular device, activity issued into the default stream will prevent any activity subsequently issued into other streams from occurring until the default stream activity is complete. Furthermore, items issued into the default stream will not begin until all other activity issued to that device is complete.
All streams (and events) have an implicit device association, which means that the above statements basically do not apply for work issued to separate devices.
3 rules for stream semantics:
Items issued into the same stream will execute in issue order.
Items issued into separate, non-default streams will have no ordering prescribed by CUDA
Items issued into a legacy default stream will not begin until all previously issued activity to that device is complete, and will prevent any subsequently issued activity from beginning, until that item is complete.
Again, all of the above statements are with respect to a particular device.
For clarity, to give an example based on your pseudo code, this should run as expected/shown when compiled and run this way:
Thanks Robert for the comprehensive reply. One minor observation.
In your worked example, you started the inner loop at zero instead of one. Was this intentional? The only issue I’d see starting the first round in stream 0, would be a lack of overlap with the following one.
Later: Oops, I hadn’t read it properly. Does cudaStreamCreate start at stream 1?
You can start your inner loop at any value you want. It has no connection to the streams usage. (Try it!)
A stream is a stream. it doesn’t start at “stream 1” or anything else. Every created stream is simply logically separate from every other created stream. There is no ordering among streams, nor are they counted or given ordinal values, from the programmers perspective. The stream variable is an opaque handle.
Yes, if you look in the profiler, you will see the streams numbered. But that has essentially no direct connection to the source code and is not a programmer-visible concept. And generally, in my experience, neither created streams nor the default stream are given the ordinal 1, when viewed in the profiler.
Ah, I get it. You observed that the default stream is sometimes called the null stream or stream 0. Therefore it is logical to conclude that there is some kind of numbering. However there isn’t. So null or stream 0 is used to distinguish it from created streams, but doesn’t have to do with ordering. Unless you feel that the 3 stream semantic rules I stated communicate some form of “ordering”. Nevertheless, there isn’t really a counting or numbering of streams, except as viewed from the profiler (and except for “stream 0”, if you happen to call the default stream that).