Un-coalesced memory stores Could someone say if there is a penalty for using Un-coalesced stores

I am producing some code that is going to write out some data to global memory where it will never be used again by the GPU (it will be copied to the host and saved for later). What I want to know is, is there a penalty for performing un-coalesced saves, or can the processors perform some form of fire and forget for the stores?

Thanks in advance


All writes are fire and forget–this is why __threadfence() exists.

The only penalty is reduced memory bandwidth because more memory transactions are required to move the data. If you are writing a small amount of uncoalesced data, you probably won’t even notice, though.

Basically, a transfer of 16 coalesced data will cost 1 memory transaction. But if you transfer 16 uncoalesced data, it will cost 16 memory transaction.
As Global Memory has a low latency, it could become a bottleneck.