__snycthreads example from the "CUDA by example" book dot.cu code strange behaviour, not as

Hi All,

I just reading the book “CUDA by example”, and I have found something I cannot understand:

In chapter 5.3.1, “Dot product optimized (Incorrectly)”, the book is explaining part of the used kernel:

int i = blockDim.x/2;

  while (i != 0) {

      if (cacheIndex < i)

          cache[cacheIndex] += cache[cacheIndex + i];

      __syncthreads();

      i /= 2;

  }

The book emphasize that, it is not a good idea to put the

__syncthreads();

inside the IF block, because this will hang the GPU, and we have to kill the program running: CUDA by Example

So this is the code version what would hang the program regarding to the book:

int i = blockDim.x/2;

    while (i != 0) {

		if (cacheIndex < i) {

            cache[cacheIndex] += cache[cacheIndex + i];

			__syncthreads(); 

		}

        i /= 2;

    }

After this statement the book explains clearly why the second code version is not good. But so surprisingly if I use the second version, I compile it, and I run it, it just gives me the good result,

and the GPU does not get hang up…

Is this book out-dated already? Is there some new feature in the CUDA Architecture, that automatically avoids the above thread synchronization error???

I also attach the whole code…

Someone can give me an idea, why is this? Because I like this book, it clearly explains CUDA, but now I have lost my confidence in it :(

(I have Geforce GTX560 Ti , Win7 64 bit, Visual Studio 2008 Express Edition)
dot.cu (3.63 KB)

I believe the book over-simplifies matters. In this specific example, threads which do not reach the __syncthreads() anymore finish straight away. AFAIK, behavior of __syncthreads() is undefined if some threads of the block have finished already, so the example is not guaranteed to lock up (though it is not guaranteed to give good results either). If I remember correctly, compute capability 1.x and compute capability 2.x devices also act differently on this.

In short, in this case there are no guarantees at all. Neither that the code works, nor that it doesn’t. External Image

Thanks for clarifying this! So the point is, there MAY be a lock up, using the

__syncthreads()

without care and thinking. (but the authors should have put a more reproducible example in this not too cheap book External Image )

Thanks again, bye!