Hello,

I am computing the dot product, similar to the example (nvidia projects).

```
// Tree - like reduction
if (thx < i){
for(int stride = i / 2; stride > 0; stride >>= 1){
__syncthreads();
//shared_h[thx] += shared_h[stride + thx];
}
}
```

In my version the vector lengths must not be a power of two, so that I put the condition thx < i, as the tree like reduction needs vector lengths equal to the power of two.

The problem is that the code hangs when the number of threads exceeds 16.

Why is that?

In the Programming Guide it says that

I am not really sure what that means.

Thanks in advance.

Cem