Am trying to build an index structure in the kernel code:
atomicCAS((int*)&index[val], -1, atomicAdd((unsigned int*)&index_pos, 1));
index is declared as dynamic shared memory array and initialized to with -1, index_pos is declared as volatile.
The intuition is the following: only the first thread in the block should initialize index and increment index_pos. However I have noticed that index_pos is incremented multiple times by conflicting threads.
Why is this happening?