Hello,
I want to load an array into Device memory, then modify it in sections with a kernel call.
The loading of the array isn’t causing problems, and when I tried to modify it all at once
that was fine. However, the main issue arises when I try to call my kernel on subsections
of this array.
For example I use the following lines:
for(i = 0, j = 0; i < max; i+=iStep, j+=jStep)
{
kernel <<< dimGrid, dimBlock >>> (&(PGOldDevice[i]), &(PGNewDevice[j]), …);
}
cudaThreadSynchronize();
CUDA_SAFE_CALL(cudaMemcpy(PGNew, PGNewDevice, size, cudaMemcpyDeviceToHost));
Now if I do that, whatever I modified in my kernel should be in the PGNewDevice now, right?
And this should also be copied into PGNew? This isn’t happening.
Please help if you can!
Thanks,
R