device sub array access

Hello,
I want to load an array into Device memory, then modify it in sections with a kernel call.
The loading of the array isn’t causing problems, and when I tried to modify it all at once
that was fine. However, the main issue arises when I try to call my kernel on subsections
of this array.

For example I use the following lines:

for(i = 0, j = 0; i < max; i+=iStep, j+=jStep)
{
kernel <<< dimGrid, dimBlock >>> (&(PGOldDevice[i]), &(PGNewDevice[j]), …);
}

cudaThreadSynchronize();

CUDA_SAFE_CALL(cudaMemcpy(PGNew, PGNewDevice, size, cudaMemcpyDeviceToHost));

Now if I do that, whatever I modified in my kernel should be in the PGNewDevice now, right?
And this should also be copied into PGNew? This isn’t happening.
Please help if you can!
Thanks,
R

Nevermind, I found what I did wrong!
This actually works I just had one of my parameters messed up. Thanks to whoever took a look at this.