CUDA is well suited for image processing (because its parallel nature), so it’s a good idea looking around here too. But CUDA is essentially just a way for making things run faster, in exchange for a little more programming work. (Yes, it’s usually more complicated to do something on the GPU than it would be on one CPU core.)
I think what you eventually want can be best described as a form of computer vision. There are nice libraries for that, OpenCV is one of the more known ones, and some books too (the O’Reilly “Learning OpenCV” for example, but, as usually, Google is your friend.) There are a lot of unsolved problems in the field though, depends on what objects you want to recognize. (You certainly can recognize a big white square on a black background, telling a cat and a dog apart is more like the “still unsolved” category.)
And when you’ve got something that works but awfully slowly… that’s the point where CUDA helps. But only if you can divide your computation to hundreds of small parts which can be done separately… let’s say operating on 32x32 pixel image snippets, or things like that.
(By the way, OpenCV has got nice tools to grab video from cameras, if you can get your camera working in say… Skype, it will be able to use it too.)
I hope this helped a bit :)