Hello guys! I have a question about dynamic parallelism.
If I have a parent kernel which calls 4 child kernels in sequence inside a for loop with say 100 steps, will the child kernels be executed in sequence? like in the following code:
__global__ void parent(){
for(int i = 0; i < 100; i++){
childA<<<...>>>(data);
childB<<<...>>>(data);
childC<<<...>>>(data);
childD<<<...>>>(data);
}
}
in the code above, will child B only start executing when child A finishes and then child C only starts executing when child B finishes and child D only start executing when child C finishes?
I did some tests and they executed in sequence and one child only started executing when the other child finished. Is this the default behaviour? Even though my tests show that it is, I still have some doubts and I would like to know what you guys have to say about this. Thank you