Hello to all,
I have to perform calculation on a (MxN) matrix,
but to simplify my kernel my code allocates a (M+2,N+2) matrix.
I want my kernel start to compute at the element (1,1), and I can do it by passing a pointer to (v+pitch+1) instead of v.
Here it is a snippet of code to clarify
[codebox]
cudaMallocPitch( (void**)&d_v, &pitch_byte,
(width+2)*sizeof(REAL4), (height+2);
pitch_element = pitch_byte / sizeof(REAL4);
kernel<<<dGrid, dBlock>>>(v+pitch_element+1);
[/codebox]
I don’t understand why I get coalesced access when I pass a pointer to v,
but I can’t get coalesced access when I pass a pointer to v+pitch_element+1
Thanks in advance
Francesco