Dynamic parallelism and streams

zilkhanir1 · June 1, 2023, 9:37pm

If i execute a parent kernel on a non default stream does its child kernels automatically executes on the parent stream?

striker159 · June 2, 2023, 7:08am

No, streams created on the host cannot be used on the device, and vice-versa.
You can find more information on cdp1 here: CUDA C++ Programming Guide

zilkhanir1 · June 2, 2023, 1:53pm

I didn’t understand the answer
I create the stream on the host and call the parent kernel with. are the child kernel executed on the default stream or on the parent stream

striker159 · June 2, 2023, 3:27pm

Neither of both streams is used. The Default Stream within a Kernel is different from the default stream of the host.
But whatever stream you use in the kernel, the created stream does not progress until the parent kernel and all its child kernels have completed. (To be more precise, the parent kernel waits for all child kernels )

zilkhanir1 · June 4, 2023, 6:57am

global void child()
{
printf(“child\n”);
}

global void parent()
{
printf(“parent\n”);
child <<<1,1>>>();
}

int main(void)
{
cudaStream_t myStream;
cudaCreateStream(&myStream);
parent<<<1,1,0,stream);
cudaStreamSynchronize(myStream);
return 0;
}

All gpu operations (parent and child ) will be on myStream?

striker159 · June 4, 2023, 12:03pm

The parent runs in myStream , the child runs in the default stream of thread block 0.

cudaStreamSynchronize(myStream) waits for both parent and child.

zilkhanir1 · June 4, 2023, 1:44pm

Thank you for your patience

Ok I think I understand.

What is the meaning of kernel stream?. Is there a kernel stream which is not the default?
Can I create a stream within the kernel ?

Assuming that this is me case:

cudaStream_t myStream1, myStream2;
cudaCreateStream(&myStream1);
cudaCreateStream(&myStream2);
parent<<<1,1,0,stream1)(data1);
parent<<<1,1,0,stream2)(data2);
cudaDeviceSynchronize();

The parent kernels may be executed in parallel, each one on its stream.
But would the children of parent 1 call may also run in parallel to the children of parent 2 call?
Is there any connection between the kernel default stream of parent 1 call to the kernel default stream of parent 2 call?

striker159 · June 5, 2023, 2:48pm

The section of the programming guide that I linked above answers some of your questions.
There is also this nvidia blog post: CUDA Dynamic Parallelism API and Principles | NVIDIA Technical Blog

Topic		Replies	Views
Dynamic Parallelism Execution Order CUDA Programming and Performance	4	664	September 21, 2015
CUDA streams, default stream zero CUDA Programming and Performance	2	1174	September 10, 2013
Why my kernel code looses synchronization when running it in stream different from default ? CUDA Programming and Performance	9	852	November 14, 2016
device streams CUDA Programming and Performance	10	4195	February 7, 2016
Question about streams CUDA Programming and Performance	1	980	August 6, 2009
cuda stream CUDA Programming and Performance	3	5794	April 6, 2011
Memory Synchronisation if using dynamic parallelism CUDA Programming and Performance	0	335	December 22, 2020
async memcopy/kernel from different contexts overlaping operations from different contexts.. CUDA Programming and Performance	9	2945	December 18, 2008
Confusion about implicit inter-stream synchronization brought by cudaMemsetAsync CUDA Programming and Performance	5	565	December 30, 2023
Questions about STREAM CUDA Programming and Performance	0	538	November 22, 2011

Dynamic parallelism and streams

Assuming that this is me case:

Related topics