Atomic operations across a warp in parallel for CC2.0 devices

Hi

I need to do an atomic FP add operation on global memory on a CC 2.0 device. If the global data referenced in a warp fit into an aligned 128-byte sector, will these operations be done in parallel or will they be executed one at a time?

Regards
Gautham Ganapathy