Quick Memory Coalescence Question

Why is this coalesced.

[codebox]sdata[tid] = MovingSmoothed[Fixed.WIDTH * iy + ix];

sdata[tid+BLOCK_SIZE] = MovingSmoothed[Fixed.WIDTH * (iy+ BLOCK_SIZE) + (ix + BLOCK_SIZE)];[/codebox]

but this is not?

[codebox]sdata[tid] = MovingSmoothed[Fixed.WIDTH * (iy-(BLOCK_SIZE/2)) + ix-(BLOCK_SIZE/2)];

sdata[tid+BLOCK_SIZE] = MovingSmoothed[Fixed.WIDTH * (iy+(BLOCK_SIZE/2)) + ix+(BLOCK_SIZE/2)];


What I want to do is grab a block of global data that is twice as large as my thread block, centered around my thread block.

Here’s the answer for future googlers:

The problem was due to memory address alignment between the thread start and the data read start. The solution is to go back an entire thread block but only read if i > BLOCK_SIZE/2

something “like” this will work


if(threadIdx.x - HALF_BLOCK_SIZE > 0)

[indent]sdata[dest] = MovingSmoothed[Fixed.WIDTH * (iy-(BLOCK_SIZE)) + ix-(BLOCK_SIZE)][/indent]


The confusing part is you will be scanning a region of 3BLOCK_SIZE but only storing 2BLOCK_SIZE but you have to start aligned, which is why starting 1/2 a block back does not work