I’m trying to make a panning map kind of thing (like google maps), to learn memory management in CUDA. I have some ideas for an algorithm that involves shifting images around and wrapping them around (so that the pixel at (y, -1) = (y, width - 1) ). What would be an efficient way to do this? Right now I have three versions:
[codebox]
// Shift with normalized texture and wrapping
global void copyShiftTexNorm(unsigned int *out, int pitch, int width, int height, int dx, int dy){
int x = blockIdx.x*blockDim.x + threadIdx.x;
int y = blockIdx.y*blockDim.y + threadIdx.y;
out[pitch*y + x] = rgbaFloatToInt(tex2D(tex, (x + dx) / (float) width, (y + dy) / (float) height));
}
// Shift with unnormalized texture (which forces me to not have wrapping)
global void copyShiftTex(unsigned int *out, int pitch, int width, int height, int dx, int dy){
int x = blockIdx.x*blockDim.x + threadIdx.x;
int y = blockIdx.y*blockDim.y + threadIdx.y;
int tx = (width + x + dx) % width;
int ty = (height + y + dy) % height;
out[pitch*y + x] = rgbaFloatToInt(tex2D(tex, tx, ty));
}
// Shift without textures
global void copyShift(unsigned int *out, const unsigned int *in, int pitch, int width, int height, int dx, int dy)
{
int x = blockIdx.x*blockDim.x + threadIdx.x;
int y = blockIdx.y*blockDim.y + threadIdx.y;
int tx = (width + x + dx) % width;
int ty = (height + y + dy) % height;
out[pitch*y + x] = in[pitch*ty + tx];
}
[/codebox]
The first version doesn’t work, it kind of blurs out the image. I don’t know why, maybe it’s some sort of floating point precision thing. However it gives vastly faster results than the other two ( 40% fps increase). The other are about equal but the texture one gives slightly lower but more stable frame rates. Any idea on how to make version 1 work or how to make the others as efficient? Any help is much appreciated :)