Appropriate

This is a question if this type of code would benifit from CUDA/GPU Computing.

I am working on a project for 3D reconstruction: http://rp181.fortscribe.com/?cat=108

And the majority of the time taken to analyze the picture is finding the Sum of the Absolute Differences of different sized blocks. This is simply:

int totalDiff = 0;

loop x amount of times
loop y amount of time
int rdiff = abolute(r2 - r1)
int gdiff = absolute(g2 - g1)
in bdiff = absoluteb2 - b1)
totalDiff = totalDiff + rdiff + gdiff + bdiff
end loop
end loop

(this isn’t real code, really in java)

This is called numerous times (((imageWidth/resolution)*(imageHeight/resolution)) for an image, where x and y are resolution.

I’ve never done CUDA before, is this appropriate, and give a sizable increase of speed? I only have a 9500GT (1GB, DDR2), would this beat a dual core 2 gHz computer? What happens when vRam runs out, does it swap with normal RAM?

Thanks, rp181

I don’t know the specifics of your algorithm but, by the looks of the pseudo-code, it looks like this algorithm could greatly benefit by being parallelised. I’d probably map 1 thread <–> 1 pixel.

Also, host and device memory don’t communicate unless you ask them to, as far as I know.