Call for ideas for CUDA Image Library

Hey, CUDA guys

I and my labmates started a project to implement a bunch of image processing algorithms in CUDA.

We’ve implemented some(such as conv, sobel…etc) and got an average 10X accelerating, this is a perfect beginning, and we’re looking for more ideas about what algorithms to be implemented.We prefer the ones used frequently, cost lots of time and have a good parallel model.

Has anybody got an idea? We will make it open when we finish the 1.0 BETA.


SVD-based image compression should work quite good. Still, i know how to implement it in M, but i don’t know which format use it…

Okay, thanks… We can do this in C and CUDA.

But, I’d like to know whether SVD-based compression has some advancetages? Compare to DCT and Wavelet.

We can implement a API for Matrix SVD, instead of the total compression and encoding…Do you think it’s a better way?

Here’s an idea for a lossyless, very fast compression algoritms:…l=shared+memory

Anyhow DCT and wavelet are really good too, but i personally like SVD more since they are made of very simple steps.

this could be an idea, it depends on what you want to do…


my wish list:

  • median filtering,
  • hough transform,
  • fast compression



Some random ideas:

color space conversion YUV, YCrCb, RGB, sRGB

component subsampling (e.g. to generate a 4:2:0 or 4:2:2 YCrCb representation)

interpolation, decimation with various algorithms

image zooming without blurring (e.g. keeping edges sharp)

image vectorization

blob detection

feature and shape detection

generalized motion vector estimation

Another way to compute the SVD of a matrix is to find the QR factorization first, which can be done by computing Q as a combination of Givens rotations (there is already an example in cuBLAS for this).

AFAIK, you should be able to compute all of your Givens rotation matrices in parallel, then multiply them all together with some sort of reduction kernel. Then multiply the resulting matrix by your original to find R.

Then you can follow a few more steps to get the SVD. It’s not the most optimal algorithm, but I believe that most (all?) of the steps this way are parallelizable on a large scale, so it should work out to be quite fast on the GPU.

Thx, these help a out.

Thank you.

Median filter is just using a specified Convolution, so it’s already available yet.

For fast compression, we’re planning to implemente JPEG 2000 in CUDA, and for H.264, we haven’t got many ideas, as it is too “huge”.

Whats the GPU? and whats the CPU? so we understand 10x better. Thanks.

GPU is 9600GT

CPU is Q6600 without a multi-thread programming

How about Image Comparison algorithms, like the Haar wavelet based ones…

Actually we’re implementing DWT in CUDA, which is more basic.


For those who have trouble interpreting the numbers.

9600GT specs:

Stream Processors 64 ( 128 on 8800GTX)
Core Clock (MHz) 650 MHz ( 575 on " )
Shader Clock (MHz) 1625 MHz ( 1350 on " )
Memory Clock (MHz) 900 MHz ( 900 on " )
Memory Amount 512MB ( 768 on " )
Memory Interface 256-bit ( 384 on " )
Memory Bandwidth (GB/sec) 57.6 (86.4 on " )
Texture Fill Rate (billion/sec) 20.8 (36.8 on " )

2.40GHz, 1066MHz FSB, 32KB + 32KB L1 x4 L1 cache, 4MBx2 L2Cache

10x on 9600GT would probably translate to around 20 or 30x on 8800GTX – which in my opinion is less… Probably therez more juice out there.

TunaCode is happy to announce the release of CUVI Lib v0.3 (Beta version) for Windows 32 and 64 Systems only)! You can download a copy from:

CUVI Lib (CUDA for Vision and Imaging Lib) is an add-on library for NPP (NVIDIA Performance Primitives) and includes several advanced computer vision and image processing functions presently not available in NPP

In this version of CUVI Lib you will find:

  • Optical Flow (Horn & Shunck)
  • Optical Flow (Lucas & Kanade)
  • Discrete Wavelet Transform (Forward and Inverse)
  • Hough Transform
  • Hough Lines (Lines Detector)
  • Color Conversion (RGB-to-gray and RGBA-to-Gray)

Several more advanced features will be added to CUVI Lib in upcoming releases. A detailed function reference can be downloaded from:

We are looking forward to hearing your feedback and guidance on our forums ( and look forward to make CUVI Lib a single complete source of computer vision and image processing functions implemented on the GPU.

You should make a separate announcement in the forum for greater visibility…

There you go :)

There you go :)

donnot have any ideas, but you can refer to image processing algorithms .net here, hope you can find what you need.