Transfer back (on device) to global memory

Are there any rules of coping data from shared memory/registers to global memory on device?

As we know we should pay attention to a coalesced transfer from global mem to shared mem, but does it work the same in transfer from shared mem to global mem? Are there any other rules, pieces of advice ? I found that time that is saved because of good usage of shared memory (coalsced, avoiding bank conflits) is wasted when I need to write data to global memory in order to transfer it back to host after kernel execution.

If you have any suggestios – please speak your mind!
Y.

You need to coalesce both reads and writes from global (at least on the older cards … not sure about the G200 ones exactly)

Shared memory doesn’t need coalesced access - there’s usually a way to make both reads and writes to global coalesced.