global reading vs writing latency

mehussein · March 23, 2007, 1:38pm

Hi,

I just have a question about the difference between reading and writing to global memory. The situation I have is as follows. I am implementing a kernel where I have one of two options. The first option is to make threads in a block diverge by making few of them read more data from global memory than others. The second option is to replicate parts of the data and make the threads diverge again but this time by making some of them write more data to global memory than others (to update replicated copies of data.) In both cases the extra reads or writes do not require synchronization afterwards. Also, in the first option, the divergence will occur at the beginning of the thread, but, in the second option the divergence will occur at the end of the thread. The choice between the two options depends on whether the latency of read and write are different or not and which is slower, and on whether divergence at the beginning is different than divergence at the end or not.

Any help is greatly appreciated!

Thanks!

prkipfer · March 23, 2007, 3:04pm

I already asked this a while ago but got no answer <img src=‘http://hqnveipbwb20/public/style_emoticons/<#EMO_DIR#>/crying.gif’ class=‘bbc_emoticon’ alt=‘:’(’ /> Someone, please ?

Peter

tachyon_john · March 23, 2007, 3:30pm

This is probably something one could write a tiny test kernel to evaluate pretty quickly, if nobody chimes in with a hard answer, I’d just write a test code and find out by experimentation. I’ve found writing many small test kernels to be very helpful in my own CUDA work thus far.

Cheers,

John

prkipfer · March 23, 2007, 4:43pm

Hm, my experience is that finding latency reasons/issues is very tricky as it is quite hard to isolate the latency from being hidden or augmented by other effects. If someone from NVIDIA could just shed a little light on whether kernels wait for write instructions to finish before finishing themselves, would be very handy. If you have evidence however, I would be very interested in learning about it. Thanks.

Peter

Topic		Replies	Views
Writes to global memory CUDA Programming and Performance	4	6287	May 10, 2008
comparision: shared mem <=> global mem actually no difference CUDA Programming and Performance	6	7551	July 21, 2008
Writing global memory 14 times slower than reading? CUDA Programming and Performance	6	10054	January 19, 2011
global memory latency CUDA Programming and Performance	12	16159	December 13, 2007
Latency in writing to Global Memory CUDA Programming and Performance	2	1938	October 18, 2007
shared memory latency CUDA Programming and Performance	7	5877	May 18, 2011
global memory access synchronous or asynchronous read/write? CUDA Programming and Performance	3	3372	May 15, 2008
read from global mem vs write to global mem CUDA Programming and Performance	13	6415	January 22, 2009
Help me to understand Global vs Local Memory performance. CUDA Programming and Performance	19	24461	December 21, 2009
Different latency in accessing global memory CUDA Programming and Performance	0	5320	January 15, 2011

global reading vs writing latency

Related topics