Hi,
I want to know what do we exactly mean by alignment requirement in coalesced memory accessing and how it affects the performance?
Thanks in advance
Hi,
I want to know what do we exactly mean by alignment requirement in coalesced memory accessing and how it affects the performance?
Thanks in advance
Basically, each thread in a block should access memory using at consecutive addresses (4, 8 or 16-byte wide elements), so that the i/o will be performed in a single operation (otherwise it might get split into 16 i/os operations, ie: 16x slower)
For example:
float_array[threadIdx.x] = x
(1-byte and 2-byte wide elements will not coalesce)