Are there any rules of coping data from shared memory/registers to global memory on device?
As we know we should pay attention to a coalesced transfer from global mem to shared mem, but does it work the same in transfer from shared mem to global mem? Are there any other rules, pieces of advice ? I found that time that is saved because of good usage of shared memory (coalsced, avoiding bank conflits) is wasted when I need to write data to global memory in order to transfer it back to host after kernel execution.
If you have any suggestios â€“ please speak your mind!