What I want to do is this:
I have an image on device side, with size 1024x1024 int, behind the adress d_img with step d_img_step. Now I want to copy a subimage of say 200x300 to an area starting at points x0 and y0. I tried adjusting the pointers:
d_subImg = (int *)((char *) d_img + y0*d_img_step + sizeof(int)*x0); // Adjust data pointer
d_subImg_step = d_img_step - sizeof(int)*x0; // Reduce line step
and then do memcpy:
cudaMemcpy2D(d_subImg, d_subImg_step, src, 200*sizeof(int), 200*sizeof(int), 300, cudaMemcpyHostToDevice);
But all I get is some messy data that is obviously wrong. Where is the mistake? Is there maybe a function for doing this somewhere (maybe NPP?)?