Warps and Shared Memory

fotone · August 15, 2011, 11:01pm

Dear Cuda experts,

Sorry for the stupid question, but I can’t understand where I’m wrong.

I have a code in which there are 64 threads in which some value is compute and stored in shared memory. In the last of this threads I would like to re-read the value in the shared memory in order to compare the results.

Something like this:

__shared__ float results[64]

....

if(threadIdx.x>31 && threadIdx.x<96) {

 nthread=threadIdx.x-32;

 .....(calculations)......

 results[nthread]= xx;

 __syncthreads();

                   <<<<---------BREAK 1

 if(threadIdx.x==95) {

    for(int i=0;i<64;i++) {

       yy = results[i];

                   <<<<---------BREAK 2

    }

 }

}

.....

If I “extract” the contains of results using a vector in the global memory in the points indicated with BREAK 1 and BREAK 2, I get different values: in particular the first 32 are wrong while the last 32 are right.

Do you have any idea?

thanks a lot,

g.

LSChien · August 16, 2011, 4:19am

don’t issue __syncthreads() inside a if-then-else because behavior is undefined.

try following code

__shared__ float results[64]

....

if(threadIdx.x>31 && threadIdx.x<96) {

    nthread=threadIdx.x-32;

    .....(calculations)......

    results[nthread]= xx;

}

__syncthreads();

if(threadIdx.x==95) {

    for(int i=0;i<64;i++) {

       yy = results[i];

                   <<<<---------BREAK 2

    }

}

RabidCicada · August 16, 2011, 2:02pm

don’t issue __syncthreads() inside a if-then-else because behavior is undefined.

try following code

__shared__ float results[64]

....

if(threadIdx.x>31 && threadIdx.x<96) {

    nthread=threadIdx.x-32;

    .....(calculations)......

    results[nthread]= xx;

}

__syncthreads();

if(threadIdx.x==95) {

    for(int i=0;i<64;i++) {

       yy = results[i];

                   <<<<---------BREAK 2

    }

}

Yeah. Technically your progam shouldn’t finish because syncthreads should wait for all threads to complete. Clearly all your threads are not giong to hit the syncthreads which should lead to “infinite wait”.

Topic		Replies	Views
Cuda: threads over 2 warps not synchronising correctly Legacy PGI Compilers	5	6888	May 26, 2011
syncthreads() issue CUDA Programming and Performance	3	1670	March 29, 2009
How can I test to see the usefullness of `__syncthreads()`? CUDA Programming and Performance	2	306	August 12, 2023
branch and precision CUDA Programming and Performance	4	4820	October 29, 2008
__syncthreads() and global memory CUDA Programming and Performance	1	2448	December 1, 2008
Problems with __syncthreads() CUDA Programming and Performance	2	880	May 4, 2013
__syncthreads question CUDA Programming and Performance	9	2027	September 30, 2009
bugfix for loop __syncthreads() CUDA Programming and Performance	1	601	August 26, 2013
The result is unpredictable. CUDA Programming and Performance	6	1072	October 25, 2013
__syncthreads() + shared memory issue CUDA Programming and Performance	7	5591	August 26, 2008

Warps and Shared Memory

Related topics