Hello to all,
I have to perform calculation on a (MxN) matrix,
but to simplify my kernel my code allocates a (M+2,N+2) matrix.
I want my kernel start to compute at the element (1,1), and I can do it by passing a pointer to (v+pitch+1) instead of v.
Here it is a snippet of code to clarify
cudaMallocPitch( (void**)&d_v, &pitch_byte,
pitch_element = pitch_byte / sizeof(REAL4);
I don’t understand why I get coalesced access when I pass a pointer to v,
but I can’t get coalesced access when I pass a pointer to v+pitch_element+1
Thanks in advance