Atomic index creation

Am trying to build an index structure in the kernel code:

atomicCAS((int*)&index[val], -1, atomicAdd((unsigned int*)&index_pos, 1));

index is declared as dynamic shared memory array and initialized to with -1, index_pos is declared as volatile.

The intuition is the following: only the first thread in the block should initialize index and increment index_pos. However I have noticed that index_pos is incremented multiple times by conflicting threads.
Why is this happening?