Is there any compiler mechanism involved in this? It seems that all kernels are executed once in a certain stream before they can be executed concurrently in other streams. But this does not seem to be the case when only one kernel is executed.
Thanks for reply. So there’s no special mechanism at work here? Because I have seen others achieve the simultaneous concurrency of kernels in each stream.
There is no way that kernels issued into the same stream will be concurrent with each other. That is contrary to stream semantics.
certainly with respect to e.g. streams 14, 15, and 16 in your first picture, that sort of behavior seems to be what I would expect in the best case.
It’s not really clear what pattern you are expecting.
If your concern about the first picture has to do with stream 13, it might be that you are hitting some sort of initialization effect such as lazy loading. You could try that test case running with