Beginner To CUDA

Hy All,

I am Engineering Student and currently working on Unmanned vehicles for which I have extensive use of Image Processing and Video Processing. I am very new to CUDA , in fact just happened to come across it and the features kind of fascinated me. What I would like to clarify through my first post is how CUDA can facilitate Video Processing. My Vehicle would be involved in lots of processing like object racking , detection , RGB filtering etc , i would like to know if CUDA is actually helpful in accelerating this process ? Also when is the VS2008 compatible with CUDA ?

Thank you


Ajay Nair

How do you currently do things like object tracking etc? If you do stuff like calculate pixel data and base the recognition on that then I am sure CUDA would give be a good avenue to go down.

Do you have existing software which you would port or are you writing from scratch?

Thank you for your reply ! I currently do it by Aforge.Net which is an image processing tool written in C# and works pretty nice !!

There are standard functions that does RGB filtering and stuff. and object tracking, yeah I guess I have always known to track objects by pixel calculations and nuthing else (do lemme know if u know other methods ). So will it help me ? and HOW ?

Hey all ! I expected a little more support from this forum for a newbie to CUDA !

In anticipation of support !!

Ajay Nair

It’s very hard to tell whether a given piece of software we know little about is easy to reimplement into CUDA.

First what you need to do is understand the algorithms in the program you’re using currently. You will have a hard time efficiently porting a piece of software to CUDA without in-depth knowledge of what is being done and how can you translate it to massive parallelism.

Generally GPUs fare very good doing image and video related things (they were made for it after all) but to be sure you need to check your algorithms. And then read the Programming Guide. It’s not only a reference of instructions, it’s also a good way to grasp the concept behind CUDA.

Hey Hy,

Thanks for replying. I am very much flexible to move completely into a new domain if the domain offers me a better quality in my output. Thats what I am wondering if CUDA actually works and enhances processing video streams recieved from web cams ? Also in what way do they enhance ?



CUDA is very effective in such tasks if you can divide an algorithmic problem into very small parts that can be done in parallel. In this case, an example would be:

Let’s assume your webcam outputs 320x240 pixel resolution and in your code you have a loop over all those pixels (320*240=76’800 elements) which does some set of operations for each of them. For example, for each pixel you want to extract the ‘blue’ color ingredient and copy it to another array (of ‘blue pixels’).

for(int x=0; x<320; x++) {
for(int y=0; y<240; y++) {

In this example, there are no sequential dependencies between iterations. You may extract the blue ingredient of [5][25] and from [6][50] logically at the same time, right? So, in CUDA you could instead do this part in parallel, exploiting your GPU’s ability to effectively handle fine-grained massive parallelism.

The CUDA approach would be, for example, to spawn 76’800 threads, one for each pixel, and have each process its element from the image array. This problem is perfect for executing on a GPU because there are plenty threads (tens of thousands isn’t actually many to CUDA, your card will be more than happy to spawn millions) and each does a small part of the job (lightweight threading).

If you have many such algorithms in your computer vision app, you can have manifold speed increase in execution (think 20x faster). However, you must first assess two issues:

  1. Are your algorithms “naturally” massively parallel on that scale? I can presume so, many image and video related programs are, but you have to check.
    Some examples of other naturally parallel problems that you can encounter include many kinds of image filtering, denoising, pre- and postprocessing and matrix and vector mathematical operations. If you have a lot of these, you’re likely to benefit from using CUDA.

  2. Take into account the overhead of copying your data (video frame) into the GPU and copying the results back to the CPU. You will do this often (if you have streaming feed). The PCIe bandwidth is about 2GB/s (conservatively). If you’ll update at 30fps, you have about 50MB of data you can copy if you aim for real time.

There have been others who used CUDA for image and video processing, see

Thank you so much ! I dont think it can get better than this !! Thanks a lot for the head start !