Hi All,
I just reading the book “CUDA by example”, and I have found something I cannot understand:
In chapter 5.3.1, “Dot product optimized (Incorrectly)”, the book is explaining part of the used kernel:
int i = blockDim.x/2;
while (i != 0) {
if (cacheIndex < i)
cache[cacheIndex] += cache[cacheIndex + i];
__syncthreads();
i /= 2;
}
The book emphasize that, it is not a good idea to put the
__syncthreads();
inside the IF block, because this will hang the GPU, and we have to kill the program running: CUDA by Example
So this is the code version what would hang the program regarding to the book:
int i = blockDim.x/2;
while (i != 0) {
if (cacheIndex < i) {
cache[cacheIndex] += cache[cacheIndex + i];
__syncthreads();
}
i /= 2;
}
After this statement the book explains clearly why the second code version is not good. But so surprisingly if I use the second version, I compile it, and I run it, it just gives me the good result,
and the GPU does not get hang up…
Is this book out-dated already? Is there some new feature in the CUDA Architecture, that automatically avoids the above thread synchronization error???
I also attach the whole code…
Someone can give me an idea, why is this? Because I like this book, it clearly explains CUDA, but now I have lost my confidence in it :(
(I have Geforce GTX560 Ti , Win7 64 bit, Visual Studio 2008 Express Edition)
dot.cu (3.63 KB)