Streams not running conccurently

jules.gransden · May 20, 2025, 9:22am

Hi everyone,
I was wondering if anyone had any idea why my streams are not running concurrently.
Here is the context:

BLOCKSIZE: 1
GRIDSIZE: 8

FunctionA(streamA){
Kernel1<BLOCK SIZE ,GRID SIZE, .., streamA>(...)
Kernel2<BLOCK SIZE ,GRID SIZE, .., streamA>(...)
Kernel3<BLOCK SIZE ,GRID SIZE, .., streamA>(...)
}

FunctionB(streamB){
Kernel2<BLOCK SIZE ,GRID SIZE,.., streamB>(...)
Kernel3<BLOCK SIZE ,GRID SIZE, ..., streamB>(...)
}

run(){
    //create streams
    streams = new cudaStream_t[32];
    for(int i = 0; i < num_streams; i++) {
      cudaStreamCreate(&streams[i]);
        }

    num_iterations = 150

    for(int i = 0: i<num_iterations: i++){
        functionB( 0 );
        for(int j = 0: j<i : j++){
             int stream_id = (j % (num_streams - 1)) + 1; // Skip stream 0 
             functionA(stream_id);

           }

    }
    for (int s = 1; s < num_streams; s++) {
        cudaStreamSynchronize(streams[s]);
        
    }

Each functionA iteration should run in parallel within the nested for loop. within functionA and function B, 2 or 3 kernels run sequentially though. So it is a group of kernels running sequentially which we need to run in parallel.
Now I would expect, this to run in parallel streams, but I they all run in sequential order but in different streams. Why is that? Nothing I do seems to work. I modified the grid size and blocksize for each kernel, but nothing. Hopefully someones can help.
Thank you.

hstlr · May 22, 2025, 2:35pm

Is there any special reason why you’re calling FunctionB() with the default stream? As far as I understood it, workloads being processed in the default (0) stream can’t run concurrently with workloads in other streams.

Meaning: Whenever a workload within the default stream is executed, any other workloads will be stalled until the work in the default stream has finished. So, the way I see it, you’ll have to call FunctionB() with a non-default stream, too, if you want to achieve “reliable” concurrency.

In addition, kernels launched within the same stream (e. g. Kernel2 and Kernel3 in FunctionB()) will always run sequentially. But this should be obvious as it’s true for non-default streams as well as for the default stream.

Robert_Crovella · May 22, 2025, 2:38pm

By my read of the code, that is not the default stream. It is the stream that was created that corresponds to index 0 in the array where created stream handles are stored.

hstlr · May 22, 2025, 2:44pm

That may be what was meant. But by the looks of the code, at least FunctionB() clearly launches its kernels using the stream 0. Or am I mistaken here?

[EDIT:]
Okay, you can’t really tell. In respect to how the functions and their calls are working together, this seems to be more of pseudo code than actually executable code.

Robert_Crovella · May 22, 2025, 2:45pm

Sorry, you are correct. I misread the code. That is really how the indicated code for Function B would work (although the code overall doesn’t appear to be syntactically sensible.) If we take that literal interpretation, then the loop that calls FunctionA is doing illegal things.

FunctionA has a similar definition to FunctionB, in terms of its stream usage. If we take the direct interpretation, then the loop is calling FunctionA with streams of 1, 2, 3, etc.

That is illegal. You must actually use a stream handle when invoking a kernel, not an integer. The usage of the NULL stream is a special case where the usage of the literal 0 will select the NULL stream

Topic		Replies	Views
Kernels not running concurrently in different dedicated streams CUDA Programming and Performance	3	133	April 29, 2025
Kernels executing concurrently in different streams do not behave as expected CUDA Programming and Performance	6	471	December 20, 2023
Cannot see concurrent kenrel execution by stream CUDA Programming and Performance	2	588	November 16, 2017
Why streams cant run concurrently CUDA Programming and Performance	4	1001	March 22, 2018
CUDA Streams: Start at the same time CUDA Programming and Performance	3	672	November 12, 2021
Concurrent executions of streams CUDA Programming and Performance	6	511	December 19, 2022
Excution kernel with default stream CUDA Programming and Performance	2	726	November 28, 2016
My streams are not running concurrently CUDA Programming and Performance	7	1898	March 6, 2018
Separate kernel grids do not execute concurrent CUDA Programming and Performance	1	3289	December 18, 2009
CUDA streams, default stream zero CUDA Programming and Performance	2	1234	September 10, 2013

Streams not running conccurently

Related topics