I just have a question about the difference between reading and writing to global memory. The situation I have is as follows. I am implementing a kernel where I have one of two options. The first option is to make threads in a block diverge by making few of them read more data from global memory than others. The second option is to replicate parts of the data and make the threads diverge again but this time by making some of them write more data to global memory than others (to update replicated copies of data.) In both cases the extra reads or writes do not require synchronization afterwards. Also, in the first option, the divergence will occur at the beginning of the thread, but, in the second option the divergence will occur at the end of the thread. The choice between the two options depends on whether the latency of read and write are different or not and which is slower, and on whether divergence at the beginning is different than divergence at the end or not.
Any help is greatly appreciated!