read parallelized?

From the guiding it seems that the parallelization of reading happens only when threads read the continuous memory. So why it is like this?

My understanding is if thread n reads position p of global memory, thread n+1 should read p+1, and p can be 1 byte or 2 byte or 4byte because I can use int, int2, and int 4?
Is it right?

Also if thread n reads position p,p+1,p+2,p+3, each an integer, and thread n+1 reads p+4,p+5,p+6,p+7 and so on, the parallelization seems not to happen, but if I change to thread n reads an int4 at the position p and thread n+1 reads another int4, the parallelization will happen? Is it the same when each thread reads 4 integers and one int4?


Parallelization happens in both cases. But contiguous memory access is needed for coalesced memory reads (and writes). This dramatically improves memory access performance