coalesced data accesses in global memory

Gorai · May 7, 2010, 4:58am

hi everone,
i am a starter in cuda.
i was reading about the coalesced data accesses in the global memory from performance guidelines in the programming guide 2.3.1 for cc 1.1 devices.

it is said in there –
"Coalesced 8-byte accesses deliver a little lower bandwidth than coalesced 4-byte accesses and
coalesced 16-byte accesses deliver a noticeably lower bandwidth than coalesced 4-byte accesses "

i dint exactly get that.
can anybody explain it to me with examples?
thanx in advance.

Charley · May 11, 2010, 9:27pm

Look at NVIDIA CUDA C Programming Best Practices Guide