Streams in nsight report

I want to know more details about the concept of “streams” in the output of a trace run. As you can see in the picture, the report contains two streams. In each stream, I see some kernels and memory percentages.

So, specifically I want to know how nsight or cuda driver decides to put kernel X in stream 1 or stream 2?
Is there any difference in the characteristics of kernels in these two streams?

Hi Mahmood,

Streams are part of the CUDA API, intended to help get more concurrency. In your code, you can create streams, and launch kernels, memcpys, and memsets into those streams. A stream is like a FIFO queue of work for the GPU. A stream guarantees that each operation launched into it completes before the next operation starts. Work in separate streams may execute concurrently if the hardware has resources available to do so. For example, a GPU with two Copy Engines is capable of executing multiple kernels, a host-to-device memcpy, and a device-to-host memcpy all at the same time, as long as those operations are all in different streams.

See here for more info. particularly the section about streams:

The trace tools then display which stream each workload executed on. Since streams are serialized sequences of work, it makes sense on a timeline to display them as individual rows.

1 Like