Kernels executing concurrently in different streams do not behave as expected

I execute a series of kernels on multiple streams.

for (int i = 0; i < stream_count; i++)
    {
        kernel_1<<<grid, block, 0, streams[i]>>>(d_data);
        kernel_2<<<grid, block, 0, streams[i]>>>(d_data);
        kernel_3<<<grid, block, 0, streams[i]>>>(d_data);
        kernel_4<<<grid, block, 0, streams[i]>>>(d_data);
    }

These kernels are same except function name. I used nsys to analyze the execution of these kernels, and the results were beyond expectation.

But when I execute only one kernel in each stream, it behaves as expected.

To verify, I tried to execute two kernels in each stream, and it did not behave as expected as executing four kernels.

Is there any compiler mechanism involved in this? It seems that all kernels are executed once in a certain stream before they can be executed concurrently in other streams. But this does not seem to be the case when only one kernel is executed.

My platform is Ubuntu 22.04, CUDA version is 12.2, GPU is RTX 4070

CUDA does not give any guarantees about overlapping kernels in independent streams.

Thanks for reply. So there’s no special mechanism at work here? Because I have seen others achieve the simultaneous concurrency of kernels in each stream.

There is no way that kernels issued into the same stream will be concurrent with each other. That is contrary to stream semantics.

certainly with respect to e.g. streams 14, 15, and 16 in your first picture, that sort of behavior seems to be what I would expect in the best case.

It’s not really clear what pattern you are expecting.

If your concern about the first picture has to do with stream 13, it might be that you are hitting some sort of initialization effect such as lazy loading. You could try that test case running with

CUDA_MODULE_LOADING=EAGER ./myapp

Thank you. Your reply perfectly solved my problem. The picture below shows what I expected.

Thanks again.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.