Simple kernel producing wrong results:

sergeyn · May 3, 2014, 3:08pm

I might be missing something obvious, but I get different results when I run this kernel over and over again:

__shared__ double smem[1];
  double val = 0.0;
  for (int i = 0; i < 64; i += 1)
  {
    if (threadIdx.x == 0)
      smem[threadIdx.x] =  i;

    __syncthreads();

    val += smem[0];
  }
  ptr[blockIdx.x*64 + threadIdx.x] = val;

I run this with thread block size of 64 and grid size of 128. What am I doing wrong here ?

Thanks.

Edit:
I observe different results from the previous run on random thread blocks but always in the 2nd warp (i.e - last 32 values of a block differ). There seem to be only one block failing.

Robert_Crovella · May 3, 2014, 3:58pm

Add a __syncthreads() after the val += smem[0]; line

Warp 0 can race ahead of the other warps. Take this scenario:

all warps sync at the syncthreads. Then warp 0 proceeds. It updates it’s local val variable. It then continues on the for loop and updates smem[0] to the next value of i. Then it waits at the barrier.

After that, warp 1 picks up and continues executing. But it now updates its val with the smem[0] value which has been updated again by warp 0.

The result of this behavior is that warp 0 will always produce the correct result (2016) but higher warps may return 2016 or some higher number.

sergeyn · May 3, 2014, 4:46pm

Of course ! I knew it is something as embarrassing as this one.

Thanks!

Topic		Replies	Views
using syncthreads still at n00b status CUDA Programming and Performance	4	16089	December 1, 2010
Getting wrong output from CUDA kernel CUDA Programming and Performance	6	8382	April 15, 2011
Shared Memory Problems - __syncthreads() doesn't work? CUDA Programming and Performance	5	2671	December 29, 2011
Warps and Shared Memory CUDA Programming and Performance	2	1042	August 16, 2011
Did the warp size change? CUDA Programming and Performance	0	2495	June 12, 2008
Cuda: threads over 2 warps not synchronising correctly Legacy PGI Compilers	5	6957	May 26, 2011
Simple Thread Problem CUDA Programming and Performance	1	4072	September 24, 2009
CUDA BUG? Shared memory contents differ across threads __syncthreads() not working??? CUDA Programming and Performance	1	1895	September 10, 2009
warp synchronization test CUDA Programming and Performance	5	1741	September 2, 2014
not reading all values from array CUDA Programming and Performance	3	702	April 26, 2017

Simple kernel producing wrong results:

Related topics