lossless JPEG fast decompression on CUDA

Hi all,

          I'd like to know if some CUDA programmer has already tried to port jpeglib 6.0 (the most used lib for jpeg). 

In other words, can one get very fast jpeg decompression on GPU ? I guess that the DCT part of decompression should fit quite well on CUDA architecture and Hoffmann counterpart shoul be the bottleneck. I’m I right ? Have you some clue about it ?

Many thanks


In my experience with porting codecs to CUDA (no, haven’t done jpeg) you should make a split between the things the CPU is good in, like serial manipulation of bitstreams, and the things the GPU can do very fast, such as parallel transforms of data.

I suggest starting with the part you can obviously speed up (in this case, the DCT transform) then slowly trying to more things to the GPU, regularly benchmarking to see if it is still worth the trouble.

Any chance you have looked at PNG compression? If so what sort of performance did you obtain?

I thought a long time ago there was some sort of JPEG acceleration built into graphics cards; if anyone finds a way to access it, please post it here! Thanks!

Curious about this: anyone know of PNG compression leveraging CUDA?

You can try using nvCUVID lib, it is meant to decode video streams but if you throw a singe JPEG frame ( starting at 0xFFD8 ) it will decode it and callback you with the resulting YUV_NV12 image.

I can’t find it in the documentation, but line 62 of cuviddec.h (CUDA 4.2) reveals support for the JPEG codec:

typedef enum cudaVideoCodec_enum {






    cudaVideoCodec_JPEG,  // <-----Line 62: JPEG codec defined in enumerated type 'cudaVideoCodec'




    // Uncompressed YUV

    cudaVideoCodec_YUV420 = (('I'<<24)|('Y'<<16)|('U'<<8)|('V')),   // Y,U,V (4:2:0)

    cudaVideoCodec_YV12   = (('Y'<<24)|('V'<<16)|('1'<<8)|('2')),   // Y,V,U (4:2:0)

    cudaVideoCodec_NV12   = (('N'<<24)|('V'<<16)|('1'<<8)|('2')),   // Y,UV  (4:2:0)

    cudaVideoCodec_YUYV   = (('Y'<<24)|('U'<<16)|('Y'<<8)|('V')),   // YUYV/YUY2 (4:2:2)

    cudaVideoCodec_UYVY   = (('U'<<24)|('Y'<<16)|('V'<<8)|('Y')),   // UYVY (4:2:2)

} cudaVideoCodec;

I don’t know about the lossless JPEG, but there is a talk about some JPEG at GTC 2012: “Fast JPEG Coding on the GPU” - http://tinyurl.com/87hr9hh

wumpus, do you still work on reverse engineering CUDA binaries? Did you look into Kepler? I can’t get through the encoding of the dependency information :(