Cuda global reads/writes in cooperative kernel

Draft · October 9, 2020, 3:25pm

Hello,

suppose we make a grid group grid inside a cooperative kernel. We can call grid.sync() to synchronize the group. I am curious, if this also ensures that every thread is aware of write ops to global memory. In other words, is global memory in consistent state across threads after grid.sync()?

Suppose the following code snippet:

void __global__ my_coop_kernel(int * nums) {
    auto grid = cooperative_groups::this_grid();
    nums[grid.thread_rank()] = grid.thread_rank();
    grid.sync();
    if (grid.thread_rank() == 0) {
    for (auto i = 0; i < grid.size(); ++i)
        printf("%d\n", nums[i]); // Is this going to be correct?
    }
}

Best Regards,

Draft

striker159 · October 10, 2020, 9:36am

Yes, it will print the correct numbers. If you take a look at the ptx code, Compiler Explorer
it shows a global memory barrier (membar.gl) between writing the values and accessing the values.

The barrier disappears if grid.sync() is removed.

Topic		Replies	Views
Cooperative groups grid sync + global write issue CUDA Programming and Performance	4	2017	February 12, 2019
Cooperative Group Grid synchronization leading to execution freezes CUDA Programming and Performance cuda	3	247	July 8, 2024
Cuda Grid Synchronization (cudaLaunchCooperativeKernel) - will global memory calls be shared between SMs? CUDA Programming and Performance	1	546	February 2, 2018
__syncthreads() and global memory CUDA Programming and Performance	1	2455	December 1, 2008
Cooperative_groups::this_grid() is not valid on my Volta architecture GPU. How to globally synchronize CUDA Programming and Performance cuda	3	179	June 4, 2024
writing to global memory in kernel can each thread write different amount of data into an array? CUDA Programming and Performance	0	666	December 4, 2009
write to the global memory with grouping? CUDA Programming and Performance	1	4326	July 26, 2010
Possible race in CUDA Cooperative Groups CUDA Programming and Performance	4	748	December 9, 2020
Threads in global functions, write on shared memory CUDA Programming and Performance	0	820	March 5, 2009
What is behind Cooperative Groups? How about its performance? CUDA Programming and Performance	2	1892	November 12, 2017

Cuda global reads/writes in cooperative kernel

Related topics