Consider a warp where each thread is doing an atomic operation into one array based on an index in another array. E.g.
indexes: 0,0,1,1,2,2,3,3,4,4 ... atomicSomething( &B[indexes[thread_id]], ...
In this example, thread 0 contends with 1, 2 contends with 3, etc.
Will some threads (e.g. 0,2,4…) operate in parallel, or will the entire warp serialize?
If it operates in parallel, would a more complicated index array (e.g. 0,1,2,3,3,3,3,4,4,5,6 …) cause the entire warp to serialize?