cuda library for interp2() - use Rotate with 0 angle?

I am trying to shift an image in both x and y directions by a subpixel amount using bilinear interpolation. I believe the correct function to use for this is nppiRemap.

In matlab :
% dx - number of columns to shift image
% dy - number of rows to shift image

[X,Y] = meshgrid(1:c,1:r);
[X1,Y1] = meshgrid((1-dx):1:(c-dx),(1+dy:1:(r+dy));
shifted_img = interp2(X,Y,img,X1,Y1);

Is there a way to do this without having to create and populate 2 additional buffers the same size as the input image? Since dx and dy are constant across the entire image it seems there should be a way to use an offset.

I know I could create a custom kernel to do this but it seems like something pretty standard and I find it hard to believe that it is not available already… hate to reinvent the wheel.

Any suggestions?

So doing some more research I really don’t want to have to create a kernel just to create the x,y maps which would add 2 * img_size global writes to memory.

But I saw that rotate does a rotation and then a shift taking in just an x and y shift value.

How inefficient is it to do that? Or am I better off starting to refresh my tiling knowledge to try and write my own bilinear interpolation kernel.