Do atomics coalesce?

I have the following piece of code in a kernel:

atomicAdd(&out[threadIdx.x],reg_val);

Now, assuming that all threads in the warp will be able to write at the same time (they can), will these writes be coalesced?

My understanding is that up to 8 atomic ops per clock per SM will be executed if they’re in the same cache line.

See page 28.

Perfect. Thanks.