Excution kernel with default stream

aLbErT_h · November 28, 2016, 4:22pm

Hi,

I have a question about the default stream. I need to execute a kernel like this:

template <typename T>
__global__ void myKernel(T *myData, int ix, int iy, int iz)
{
    //index for each dimension
    int i = blockDim.x * blockIdx.x + threadIdx.x;
    int j = blockDim.y * blockIdx.y + threadIdx.y;
    int k = blockDim.z * blockIdx.z + threadIdx.z;
    
    //Global index
    int idx = (((i * iy) + j) * iz) + k; 
    
    //Validate the dimension indexes.
    if ( i < ix and j < iy and k < iz )
    {
        myData[idx] += MY_CONSTANT;
    }
}

From the host I execute the kernel:

float *data;
cudaMallocManaged(&data, M);
for (int i = 0; i < N; i ++)
{
    myKernel<<gridSize, blockSize>>(data, ix, iy, iz);
}

My doubt is if the kernels will execute in the order that they were issued or if they will run concurrently.

I know that the default stream execute the instruction in the order that were issued but I’m not sure if this is true with a iteration.

Thank you.

Robert_Crovella · November 28, 2016, 4:33pm

Yes, they will execute in the order issued. They will not run concurrently.

All CUDA operations that don’t explicitly specify a stream execute in the default stream.

All CUDA operations in a given stream (default or not) are serialized.

note that your kernel launch syntax is incorrect. It should be <<<…>>> not <<…>>

[url]Programming Guide :: CUDA Toolkit Documentation

aLbErT_h · November 28, 2016, 4:42pm

Ok, perfect!

Thank you for reply txbob and… yes, I have a mistake in my kernel launch syntax, I had not seen it

Topic		Replies	Views
Stream Ordering CUDA Programming and Performance	6	1543	October 12, 2021
CUDA streams, default stream zero CUDA Programming and Performance	2	1242	September 10, 2013
Dynamic Parallelism Execution Order CUDA Programming and Performance	4	733	September 21, 2015
Streams not running conccurently CUDA Programming and Performance	4	132	May 22, 2025
Streams and Kernel Execution Order CUDA Programming and Performance	2	1052	August 19, 2010
Should legacy default stream behave serially under multiple host processes/contexts? CUDA Programming and Performance	2	362	October 18, 2022
Stream execution order in CUDA exercise Teaching & Curriculum Support	1	1281	February 3, 2020
Dynamic parallelism and streams CUDA Programming and Performance cuda , kernel	7	1124	June 5, 2023
Kernels executing concurrently in different streams do not behave as expected CUDA Programming and Performance	6	514	December 20, 2023
GPU Pro Tip: CUDA 7 Streams Simplify Concurrency Technical Blog	51	2948	February 5, 2020

Excution kernel with default stream

Related topics