given that it is seemingly global memory, i do not fully comprehend why you are concerned about memory consistency (when using dynamic parallelism)
and the sample code seems flawed:
even if only 1 thread per block launches the child kernel: change_some_global_data, you would end up executing change_some_global_data 128 x 128 times, if i am not mistaken
if you want to “make sure memory changing only among current block”, i would think that it much depends on how a block accesses (the) global memory (array), and that you can simply pass to the child an updated pointer, that points to its section within (the) global memory (array)
Actually i was in hurry that;s why i put flawed code. Now i fixed.
I am only concerning that after array data is changed by child kernel, How can i make sure that every thread of parent_launch kernel can see these changes ?
I put __syncthreads at Line 14. But it provides only thread level synchronization.
by ensuring that the very first parent kernel block, and only the very first parent kernel block, launches the child kernel, to update the entire array (use an atomic flag stored in global memory)
or by ensuring that the entire array is updated beforehand (the child kernel becomes a parent kernel, issued prior to your current parent kernel, and you do not use dynamic parallelism)
the initial point was: if you have 1 thread per parent kernel block launch the child kernel, you can not have the child kernel’s dimensions as <<<128, 256>>>, it should rather be <<<1, 256>>> in my view
your parent kernel has 128 blocks; if only block launches the child, as opposed to every one, you may have a child kernel dimension of <<<128, 256>>>
I know, if u changed my block size with 1 block of parent kernel, i can ensure all.
However my scenario is like that i have parent_launch kernel <<<128, 256>>. And entire data array is changed successfully at change_some_global_data child kernel. After these changing, How can i ensure that any thread of parent_launch (in any block) sees same entire data array?
the only way you can have dynamic parallelism in the above, is if a thread of parent_launch reads the same element written by a corresponding thread of change_some_global_data, such that synchronization outside of a block does not matter
If i changed my code like this, i can ensure that all threads of parent kernel aware of all changing. I wanted to explain these with “I know, if u changed my block size with 1 block of parent kernel, i can ensure all.”
__global__ void parent_launch(int* data){
change_some_global_data<<< 128 , 256 >>>(data, myId);
cudaDeviceSynchronize();
__syncthreads();
//in this point every thread of parent can show changing of data array.
}
void host(){
parent_launch<<< 1, 256 >>>(data);
}
But I would like to create parent kernel with many grid like parent_launch<<<128,256>>>
Maybe splitting kernel into 2 is good idea. But I am just wondering is it possible or not? just concerning memory consistency inside parent kernel after some changing child kernel.
By the way,
//in my case every threads will invoke child. like this
change_some_global_data<<< 128 , 256 >>>
//But why did you say below code is wrong?
if (threadIdx.x == 0)
{
change_some_global_data<<< 128 , 256 >>>
}
Thank you. My question is after last case what you wrote above, can any thread of parent kernel see changing of data array anytime?
For example, let’s say thread 0 at block-0 finished its job faster than others. How can i guarantee that thread 0 at block-0 see changing of another threads(such as thread 120 at block - 250).
Or like this code;
void host(){
parent_launch<<< 128, 256 >>>(data, out);
}
__global__ void parent_launch(int* data, int* out){
change_some_global_data<<< 128 , 256 >>>(data);
//Let's assume this is device function wants to compute something with data array.
//And it writes all results into out.
access_all_data(data,out);
}
it depends on how threads access the arrays - data/ out
if the threads of change_some_global_data and access_all_data access the arrays in the same way (e.g. (blockDim.x * blockIdx.x) + threadIdx.x), you could get away with