Call to _syncThreads() not needed?

idrank · March 10, 2015, 1:29pm

Hi,
I’m running a simple kernel to reverse arrays using shared memory, a la 'Supercomputing for the Masses: Part 3". Below is my kernel:

global void reveseArrayUsingSharedMem(float *d_in, float *d_out)
{
extern shared float s;
int tid = threadIdx.x + blockDim.x * blockIdx.x;
if(threadIdx.x%10 == 0)
printf(“I am threadId %d in blockId %d\n”,threadIdx.x,blockIdx.x);

    s[threadIdx.x] = d_in[tid];

//__syncthreads();

int rtid = blockDim.x*gridDim.x - blockDim.x*blockIdx.x - threadIdx.x - 1;
d_out[rtid] = s[threadIdx.x];

}

I’ve run this with over 5 million elements (threads = 512, blocks = 10240) and I always get the correct answer whether or not I include the blocking call to __syncthreads(). How come? Shouldn’t I see breaks if the writes to shared memory are not sync’d? I’ve even added the above print statement for certain threads to ‘slow down’ their wrtes to shared mem, but I don’t see any difference.

Thanks

allanmac · March 10, 2015, 3:59pm

Your kernel requires neither thread coordination nor resource sharing so there is no need for explicit synchronization.

idrank · March 10, 2015, 5:22pm

Well, that makes a lot of sense :) Thanks for having a look.

Cheers

Topic		Replies	Views
Shared memory vs global memory CUDA Programming and Performance	6	3450	April 30, 2007
dynamic shared mem and syncthreads problem shared memory no longer set after syncthreads? CUDA Programming and Performance	3	1102	September 7, 2011
Scan sample Program CUDA Programming and Performance	0	680	October 1, 2010
Using shared memory There's something I don't understand CUDA Programming and Performance	4	818	February 14, 2011
IS __syncthread() resetting shared memory values? CUDA Programming and Performance	2	714	August 9, 2018
Using __syncthreads() while using two shared arrays CUDA Programming and Performance cuda	1	434	April 1, 2022
Not able to use _syncthreads inside a loop in emulation mode But it works fine without emulation&#33 CUDA Programming and Performance	1	1055	May 5, 2009
__syncthreads() and global memory CUDA Programming and Performance	1	2451	December 1, 2008
Shared Memory allocation.. CUDA Programming and Performance	5	5351	July 9, 2010
Efficiently loading data in the shared memory CUDA Programming and Performance	0	339	February 15, 2021

Call to _syncThreads() not needed?

Related topics