Hello all, my first post.

CUDA seems very interesting and I’d like to learn more by using it for my current project, but I’m unsure if the algorithm is suitable for CUDA implementation. Here’s what I’m doing:

- I grab a frame from a firewire camera.
- I compute the contrast in a subimage of the image (typically 16x16 pixels).
- I draw a square in Direct3D with a color according to the computed contrast and alpha blend it on top of the grayscale image from the camera.
- I move over to the next subimage and repeat.

This problem is extremely parrallel as the results form the previous subimage contrast calculation are independant from any other, hence they can all be computed asynchronously. My current implementation is threaded over multi-cores and each thread is making use of SSE/SSE3 instructions. However, I can’t get the frame-rate I need. Is this something well suited to CUDA?

Thanks in advance for any input.