building a cucv? computer vision

im a phd researcher in automotive computer vision. we have a car equipped with three cameras and a host computer with 4 cpu cores and … a gts 8600 :-)

there are several projects to port cv functions on the gpu, any interests to build a CUDA-solution (to be in namespace: cucv)

or is there still one?

i think this would have a big impact on cuda
and i just wonna start a discussion before going into details.

best regards,

Try this:
(not using CUDA yet)

yes i know the openvidia project.
there are several with cg/glsl code.
another one could be found on with tries to clone opencv with glsl.

i think cuda is the better architecture for that task :-)

are there interests?


Hi Moik,
Our backgrounds seem to be quite similar. I do research in computer vision for robotics and have previously implemented Itti & Koch’s model of attention in GLSL. I might be interested in a computer vision project using CUDA. However, I’ve not yet come to a conclusion on what computer vision operations benefit from implementations on GPUs. After all, a real-time computer vision system includes more than just filtering and feature extraction. One shouldn’t forget that you can get quite a bit of speed using MMX and SSE ops in assembly. How much do you benefit from CUDA and what is the cost in terms of development time? That remains to be investigated. My intention is to implement a SIFT-like feature extractor and draw conclusion from that effort.

FYI, there is already a CG-based SIFT implementation on the GPU, which was demo’ed recently at CVPR. If you implement this on CUDA, it would be interesting to compare your performance against the CG-based version.

@celebrandil: i fully agree with you.
you have to look at MMX and SSE, too, not focusing on the gpu only.
but on the other side, we still have speed benefit in our cv apps currently by using cuda.
development time decreases more than with glsl (which i used about one year).
@sphyraena: yes, would be a good indicator.

we are working on a klt-like implementation on cuda.


Thank you very much. I will take a look at it.

I am already implementing SIFT using CUDA (and am about finishing the keypoint orientation stage). I can confirm some speedups for the first two stages using CUDA over ChangChang’s version (don’t want to sound too optimistic at the moment though, details to follow later).

Great! By comparing implementation we can learn the ins-and-outs of CUDA. I’ve come to the feature list generation stage and am currently considering whether to use device or texture memory for the remaining feature-wise operations. My first trials with texture memory where not that successful. Somehow the code got slower just by creating the texture, without actually using it. Something must have done wrong. Good luck!

i didnt forget this thread.

if there are news, i ll post.

hope you too :)

I’m doing an app in CUDA that runs on top of openCV. I know nothing about openCV and I’ve kind of ignore dit for what I’m doing (converting some Cpp to CUDA) is their a way to just replace the openCV stuff with a GPU piece of code? I don’t have time to do anything complicated but if it was just a case of replacing some files in a directory then please fill me in!



The SIFT implementation I’ve been working on is about to be finalized. What remains to be done is to verify it’s correctness through extensive matching experiments and possibly eliminate some unnecessary shuffling of data between host and device. The computational cost (on a 8800 GTS 320MB) for a VGA image is about 1.4ms (image download) + 1.6ms (first 2 levels) + 2.0msnum_dog_levels + 10usnum_sifts. As an example, for num_dog_levels=3 and num_sifts=1000 the cost is approximately 19ms, that is a little more than 50Hz.

that sounds great!

would you provide the code?



Yes, that is my intention at least. The code is more or less finished. It takes about 15 ms to extract about 600 SIFTs and about 11 ms to match two sets of 600 SIFTs each (all against all). Currently, it only works on one octave (but multiple DoG levels) though, but it ought to be easy to add a LowPassSubSample function for multiple octaves.

Right now I seem to have a memory leak somewhere. If I run the code one or twice it works fine, but if I let it run for 1000 iterations it usually gets stuck after a while. Unfortunately, the whole machine freezes and since two others are working on the same machine, I haven’t had the opportunity to really isolate the bug. If you are interested in helping me out, I could put together an archive for you.

Can you reproduce the bug by running the same kernel on the same data over and over again without any memory allocations in between? If so, your issue may be related to one that I am having:

In my case, I can sometimes run up to 70,000 iterations, but the system usually crashes after 10,000. After one of these crashes, running ANY opengl program (like glxgears) hard locks the system. I managed to reproduce my issue in a simple test case and submit it as a bug to NVIDIA. We’ll see what happens now.

Thank you Mr.Anderson (that Matrix-guy? scary…)!

I read the thread you mentioned and am convinced that we experience the very same problem. My code might iterate finely for 10-20 seconds and then suddenly stumble for approx 5 seconds. Usually, it continues after a short break, but eventually the whole machine freezes. When working remotely, it seems less likely to freeze, but it stumbles nevertheless.

I’ve completed my Cuda SIFT code and uploaded it to Unfortunately, there seem to be a memory problem somewhere. If you could find it, I would be happy indeed. For some reason, it’s more likely to fail if the cudaArrays are allocated before the linear buffers. I’ve spent numerous hours trying to locate it, but in vain. After a few iterations, I usually fail to allocate new buffers, even if earlier buffers seem to have been properly freed.

EDIT: After running the same code on another machine (T2500, 8600GT) without seeing any memory problems at all, I fail to see why I should experience freezes on my prefered machine (2xOpteron 285, 8800GTS).

I’m new to CUDA (and GPGPU) but very interested. I’m having troubles compiling Celebrandil’s code in Visual Studio 7.1. Has anyone ported this successfully? (looks like it’s very well done - can’t wait to get it working!)

Another question: I’m a phd in computer vision and want to do realtime applications with a firewire-camera. Does anyone have or know of a simple framework for acquiring images, processing, viewing and etc. for CUDA? I’ve seen this done in OpenGL/Cg but I’m hoping a CUDA solution would be clearer for those of us not experienced with the OpenGL.

Anyway this thread seem to be a good place to ask!

Hi sag,

Have you got it running in VS? I haven’t yet tested it myself, since I don’t have a 8x00 card on any Windows machine around here.

When it comes to firewire, I believe you could try OpenCV. After all, if you are able to get an array of grey values, you could set the h_data pointer in my CudaImage class to that array.

float *imgdata = ...  // from firewire API

CudaImage cudaImg;

AllocCudaImage(&cudaImg, wid, hei, false, true); // Only allocate device memory

cudaImg.h_data = imgdata;

// Do come processing on cudaImage

cudaImg.h_data = NULL;


Good luck!


I’m interested in developing a CUDA based Computer Vision library similar to OpenVidia, or some derivative of OpenCV. Is anybody working on any such active project that I can contribute to?

I don’t have any experience developing open source apps/libraries and would like to contribute to a project somebody else is heading, if however no such effort exists, I guess the way to go is to start one, are there people who’d be interested in contributing?