I have the following piece of code in a kernel:
atomicAdd(&out[threadIdx.x],reg_val);
Now, assuming that all threads in the warp will be able to write at the same time (they can), will these writes be coalesced?
I have the following piece of code in a kernel:
atomicAdd(&out[threadIdx.x],reg_val);
Now, assuming that all threads in the warp will be able to write at the same time (they can), will these writes be coalesced?
My understanding is that up to 8 atomic ops per clock per SM will be executed if they’re in the same cache line.
See page 28.
Perfect. Thanks.