Our company develops a software, which depends currently on CPU power using Linux & jpeglib to read/scale/write JPEG images. Depending on situation, there can be dozens of JPEG images per second scaled to different sizes.
Is there any estimate if compared to jpeglib and for example a 2*Dualcore Xeon system, how much efficiency for example a single NVIDIA Tesla C870 card would provide?
A 2*Dualcore Xeon 2,1GHz system can handle up to 80 JPEG read/scale/write events per second.
I suppose, that JPEG it is better to realize on CPU as, it is fast (small amount of operations per pixel) and a host will stand whan device will be work.
Nobody has yet written a JPEG implementation using CUDA as far as I’m aware, but it would certainly be possible.
The DCT/IDCT, subsampling and color conversion steps are easily parallelizable and could generate large speedups with CUDA. The entropy compression/decompression stages would be more difficult.