I am writing a kernel in which at one point, I need to use a part of the threads in my block to process elements from a shared mem array and write the result into another shared memory array. Then, I use a single thread to sum up all the elements in that shared memory array… And this is repeated for a few iterations.
Hence, I need to do a _syncthreads() after the conditional is done so that tid = 0 can read the elements written by the other threads without any conflicts. My code looks like this…
But for some reason, when I try to run the code in Emulation mode the program aborts with the message “Incorrect use of _syncthreads();”. There are no conditionals enclosing the given code. So all the threads should definitely make it till the _syncthreads() point. As I understand, the only restriciton on using _syncthreads() is not to use inside a conditional, at a point where all threads might not reach… So I am not able to understand why this is so…
Running the code without -deviceemu gives no problems and the code runs fine. I ve also checked the output and it is correct and as expected. But I just wanted to clear this, to make sure I am not overlooking which will bite me later on.
Appreciate any help…
P.S. I simplified the code to make it easier to look at. I can post the actual code if required.