Do syncwarp() and ballot_sync() protect global write scheduling?

dscerutti · January 19, 2024, 1:00am

I know that __syncthreads() is guaranteed to make sure that all global writes and atomic operations issued before the call are finished by the time the thread block is allowed to proceed. While this will not protect against race conditions if other thread blocks are competing for the same memory, but if thread blocks are operating on exclusive regions of the global memory then I’ve learned (and confirmed with much code and testing since) that __syncthreads()is something one can lean on.

Is the same true of __syncwarp() if warps are operating on their own exclusive sectors of global memory? If not, I can work around it and get back to __syncthreads(), but the code would be cleaner and probably a little faster if I didn’t have to.

Cheers!

striker159 · January 19, 2024, 5:23am

See this section from the programming guide. CUDA C++ Programming Guide

Executing __syncwarp() guarantees memory ordering among threads participating in the barrier. Thus, threads within a warp that wish to communicate via memory can store to memory, execute __syncwarp(), and then safely read values stored by other threads in the warp.

Then there is CUDA C++ Programming Guide which states that for

int __all_sync(unsigned mask, int predicate);
int __any_sync(unsigned mask, int predicate);
unsigned __ballot_sync(unsigned mask, int predicate);
unsigned __activemask();

These intrinsics do not imply a memory barrier. They do not guarantee any memory ordering.

dscerutti · January 19, 2024, 9:04am

Hooray, thanks for the clear explanation (albeit somewhat surprising–I will need to refine my understanding if warp vote (and, I have now checked, __shfl_sync()) do not imply a memory barrier. I had assumed that they all just had __syncwarp() internally, but __syncwarp() is indeed a taller order, it seems!

system · February 2, 2024, 9:04am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
__syncthreads thread syncronization CUDA Programming and Performance	7	18591	October 27, 2009
__syncthreads() and global memory CUDA Programming and Performance	1	2455	December 1, 2008
Shared Memory and Read After Write CUDA Programming and Performance	2	1508	July 2, 2009
A stupid question on __syncthread() function CUDA Programming and Performance	5	5320	May 17, 2022
Is syncthreads required within a warp? CUDA Programming and Performance	10	12396	November 8, 2013
__syncthreads and blocking memory accesses CUDA Programming and Performance	1	3744	February 5, 2009
Warp Synchronisation Problem? CUDA Programming and Performance	4	3849	November 30, 2008
about the __syncwarp() in P100 CUDA Programming and Performance	11	4087	June 6, 2018
Why __syncwarp is necessary in undivergent warp reduction? CUDA Programming and Performance	6	3598	April 1, 2022
Synchronization, threadfence, random memory access beginner questions CUDA Programming and Performance	7	2652	April 9, 2012

Do __syncwarp() and __ballot_sync() protect global write scheduling?

Related topics

Do syncwarp() and ballot_sync() protect global write scheduling?