I am computing the dot product, similar to the example (nvidia projects).
// Tree - like reduction
if (thx < i){
for(int stride = i / 2; stride > 0; stride >>= 1){
__syncthreads();
//shared_h[thx] += shared_h[stride + thx];
}
}
In my version the vector lengths must not be a power of two, so that I put the condition thx < i, as the tree like reduction needs vector lengths equal to the power of two.
The problem is that the code hangs when the number of threads exceeds 16.
It means that the __syncthreads() MUST be called by all threads in the entire block. You have the syncthreads inside an if, so some threads don’t get there. You can fix it by putting the if (thx < i) inside the for loop.
In the scalar product example, the entire block calculates only a single result. It’s bad practice to have multiple threads writing to the same memory location, so the if (thx == 0) is there to make sure that only one thread performs the memory write.
It is true that the order of warp execution is undefined, so the if (thx == 0) could have race condition issues. In the scalarProd example, there has been a __syncthreads() call to make sure all threads are caught up, and then accumResult is updated. Since thread 0 is writing the value from accumResult[0], there cannot be any race condition to access it since thread 0 also updated accumResult[0] a few lines of code up!
In any of the examples that use if (thx == 0), you should see syncthreads used in appropriate locations to prevent race conditions.
I am writing a qr decomposition for smaller matrix sizes. I start to read from global memory to the processing and the try to write back after the calculation to the same global memory, as the qr factorization is an iterative process.
I have checked my calculation for a single iteration step and I get weird errors. But when I write to another global memory location, then there seems to be no calculation error. I have checked all the intermediate results and they are correct.
Is there any amount of time needed so I can write back to the same memory ?