I'd like to know if some CUDA programmer has already tried to port jpeglib 6.0 (the most used lib for jpeg).
In other words, can one get very fast jpeg decompression on GPU ? I guess that the DCT part of decompression should fit quite well on CUDA architecture and Hoffmann counterpart shoul be the bottleneck. I’m I right ? Have you some clue about it ?
In my experience with porting codecs to CUDA (no, haven’t done jpeg) you should make a split between the things the CPU is good in, like serial manipulation of bitstreams, and the things the GPU can do very fast, such as parallel transforms of data.
I suggest starting with the part you can obviously speed up (in this case, the DCT transform) then slowly trying to more things to the GPU, regularly benchmarking to see if it is still worth the trouble.
I thought a long time ago there was some sort of JPEG acceleration built into graphics cards; if anyone finds a way to access it, please post it here! Thanks!
You can try using nvCUVID lib, it is meant to decode video streams but if you throw a singe JPEG frame ( starting at 0xFFD8 ) it will decode it and callback you with the resulting YUV_NV12 image.
I don’t know about the lossless JPEG, but there is a talk about some JPEG at GTC 2012: “Fast JPEG Coding on the GPU” - http://tinyurl.com/87hr9hh
wumpus, do you still work on reverse engineering CUDA binaries? Did you look into Kepler? I can’t get through the encoding of the dependency information :(