syncthreads and threadfence together in a loop

kamsandh · October 13, 2010, 9:27pm

I have got confusion with the use of cuda functions __threadfence and __syncthreads() together .
Documentation says that threadfence guarantees that all global and shared memory accesses made by the calling thread prior to this function call are visible to all the threads in the system( I use global memory only).
And syncthreads waits until all the threads in the block have reached at a particular point .

So , my question is ,after updating a variable , if I use threadfence() , followed by syncthreads , all the threads in the same block should have the access to the correct values of variables updated by all other threads in the block . shouldn’t ? If I use __syncthreads alone what is the difference ?? or if I change the order of this functions what is the difference ?? ie;
__threadfence(); __syncthreads();
__syncthreads(); to __threadfence();

Unfortunately , in my code I tried all these cases , still all threads are not getting the updated values from all other threads in the same block . In principle I thought the first order should work fine .
I am using a loop for a single block . If I use the loop in host code it works fine . But if the loop is in the kernel , simply it produce incorrect result . I understand the problem with inter-block communication .
But the loop doesn’t communicate with other blocks .

Please help me .

kamsandh · October 13, 2010, 9:27pm

I have got confusion with the use of cuda functions __threadfence and __syncthreads() together .
Documentation says that threadfence guarantees that all global and shared memory accesses made by the calling thread prior to this function call are visible to all the threads in the system( I use global memory only).
And syncthreads waits until all the threads in the block have reached at a particular point .

So , my question is ,after updating a variable , if I use threadfence() , followed by syncthreads , all the threads in the same block should have the access to the correct values of variables updated by all other threads in the block . shouldn’t ? If I use __syncthreads alone what is the difference ?? or if I change the order of this functions what is the difference ?? ie;
__threadfence(); __syncthreads();
__syncthreads(); to __threadfence();

Unfortunately , in my code I tried all these cases , still all threads are not getting the updated values from all other threads in the same block . In principle I thought the first order should work fine .
I am using a loop for a single block . If I use the loop in host code it works fine . But if the loop is in the kernel , simply it produce incorrect result . I understand the problem with inter-block communication .
But the loop doesn’t communicate with other blocks .

Please help me .

Sarnath · October 14, 2010, 7:38am

After “__syncthreads()” – you are sure that all threads completed the instructions before it… so , you know every thread executed “threadFence”.
If you swap the order and issue “syncthreads” first, it makes no sense.

Compiler also optimizes based on “__syncthreads” usage.

Sarnath · October 14, 2010, 7:38am

After “__syncthreads()” – you are sure that all threads completed the instructions before it… so , you know every thread executed “threadFence”.
If you swap the order and issue “syncthreads” first, it makes no sense.

Compiler also optimizes based on “__syncthreads” usage.

kamsandh · October 15, 2010, 1:18pm

Thank you . But what does that mean by “Compiler also optimizes based on “__syncthreads” usage.” ???

kamsandh · October 15, 2010, 1:18pm

Thank you . But what does that mean by “Compiler also optimizes based on “__syncthreads” usage.” ???

Topic		Replies	Views
difference between __threadfence_block and __syncthreads CUDA Programming and Performance	17	29135	April 22, 2015
__syncthreads thread syncronization CUDA Programming and Performance	7	18500	October 27, 2009
__threadfence_block vs. __syncthreads CUDA Programming and Performance	1	4152	November 27, 2009
Thread communication using the shared memory within a single warp CUDA Programming and Performance	6	444	May 21, 2024
Is __threadfence(); useful at all? CUDA Programming and Performance synchronization	4	1474	March 4, 2023
__syncthreads() + shared memory issue CUDA Programming and Performance	7	5575	August 26, 2008
syncronize all threads from all blocks cudaThreadSynchronize() the only way ? CUDA Programming and Performance	11	8242	November 15, 2010
Synchronization, threadfence, random memory access beginner questions CUDA Programming and Performance	7	2619	April 9, 2012
__threadfence_block() vs __threadfence() ? CUDA Programming and Performance	6	6557	July 13, 2022
Different cuda blocks see different values for global memory Legacy PGI Compilers	3	4329	June 22, 2011

__syncthreads and __threadfence together in a loop

Related topics

syncthreads and threadfence together in a loop