__syncthreads thread syncronization

nebileyimya · October 26, 2009, 4:02pm

Hi all,
i am working on cuda. i try to understand ____syncthreads. i can not see any difference neither i use __syncthreads or not.
please give me an example using __syncthreads and explain how it works.
thank you.

jma · October 26, 2009, 4:38pm

__syncthreads will wait for all warps in a block to reach that point in your code. Say you are processing audio - each thread doing it’s own little detail - and have produced the raw data for one sample, you will then use __syncthreads before mixdown to the final reduced format (stereo, 7.1 …) to assure that everybody is on the same page. Without it the different warps will run ahead or behind each other, resulting in a noisy mix of partially unrelated data.

nebileyimya · October 26, 2009, 5:52pm

please give me an example of it.

LSChien · October 27, 2009, 1:35am

matrix multiplication in page 25 of programming guide 2.3

seibert · October 27, 2009, 4:15am

More generally, __syncthreads() is a barrier primitive designed to protect you from read-after-write memory race conditions within a block.

The rules of use are pretty simple:

Put a __syncthreads() after the write and before the read when there is a possibility of a thread reading a memory location that another thread has written to.
__syncthreads() is only a barrier within a block, so it cannot protect you from read-after-write race conditions in global memory unless the only possible conflict is between threads in the same block. __syncthreads() is pretty much always used to protect shared memory read-after-write.
Do not use a __syncthreads() call in a branch or a loop until you are sure every single thread will reach the same __syncthreads() call. This can sometimes require that you break your if-blocks into several pieces to put __syncthread() calls at the top-level where all threads (including those which failed the if predicate) will execute them.
When looking for read-after-write situations in loops, it helps to unroll the loop in your head when figuring out where to put __syncthread() calls. For example, you often need an extra __syncthreads() call at the end of the loop if there are reads and writes from different threads to the same shared memory location in the loop.
__syncthreads() does not mark a critical section, so don’t use it like that.
Do not put a __syncthreads() at the end of a kernel call. There’s no need for it.
Many kernels do not need __syncthreads() at all because two different threads never access the same memory location.

jma · October 27, 2009, 7:54am

2.1. Threads within a warp are executed together and are implicitly synced.

Or is that a slightly too far reaching statement? (I have arranged for a few instructions between writes and read to shared within a warp, to avoid getting overly concerned about this.)

Sarnath · October 27, 2009, 8:49am

All threads of a block (use 512 threads) loads data from memory.

Now thread 0 adds its data with data fetched from 511th thread…
Thread 1 adds its data with 510th thread and so on until 256th thread…
257th to 511th thread just fetches data and does not participate in addition…

Now, if u dont add syncthreads between the load and the calculation – you will get bad results…

Prototype this and see it for yourself.

seibert · October 27, 2009, 1:28pm

Oh right, forgot about that… That’s true, though I’ve never relied on this behavior myself. Read-after-write at the warp level is supposed to be safe automatically. (Although I’m curious how that is accomplished when the scheduling unit in hardware seems to be the half-warp. There are some subtleties there I don’t quite understand.)

Topic		Replies	Views
Does __syncthreads not work across multiple warps? CUDA Programming and Performance	9	3496	April 30, 2014
Semantics of __syncthreads CUDA Programming and Performance	18	18394	January 2, 2008
Why does single warp need syncthreads? CUDA Programming and Performance	2	1995	January 24, 2012
Problems with __syncthreads() CUDA Programming and Performance	2	958	May 4, 2013
using syncthreads still at n00b status CUDA Programming and Performance	4	16109	December 1, 2010
why I do not have a problem with __syncthreads ? CUDA Programming and Performance	10	7290	May 26, 2010
32 thread block doesn't need _syncthreads()? CUDA Programming and Performance	18	14014	January 21, 2024
__syncthreads() code execution hangs CUDA Programming and Performance	4	4199	September 20, 2007
__syncthreads and __threadfence together in a loop CUDA Programming and Performance	5	3706	October 15, 2010
How can I test to see the usefullness of `__syncthreads()`? CUDA Programming and Performance	1	362	August 12, 2023

__syncthreads thread syncronization

Related topics