I want to know more details about the concept of “streams” in the output of a trace run. As you can see in the picture, the report contains two streams. In each stream, I see some kernels and memory percentages.
So, specifically I want to know how nsight or cuda driver decides to put kernel X in stream 1 or stream 2?
Is there any difference in the characteristics of kernels in these two streams?