2D Memcpy to Subarray / Image ROI

What I want to do is this:

I have an image on device side, with size 1024x1024 int, behind the adress d_img with step d_img_step. Now I want to copy a subimage of say 200x300 to an area starting at points x0 and y0. I tried adjusting the pointers:

d_subImg = (int *)((char *) d_img + y0*d_img_step + sizeof(int)*x0);     // Adjust data pointer

d_subImg_step = d_img_step - sizeof(int)*x0;      // Reduce line step

and then do memcpy:

cudaMemcpy2D(d_subImg, d_subImg_step, src, 200*sizeof(int), 200*sizeof(int), 300, cudaMemcpyHostToDevice);

But all I get is some messy data that is obviously wrong. Where is the mistake? Is there maybe a function for doing this somewhere (maybe NPP?)?

Is it a typo that you call the image step d_step in one place, and d_img_step in another?

Yes, sorry.

Edited it.