Im currently trying to add support for digital cinema packages (DCP) to one of my video playback tools. The DCP image data is a JPEG 2000 code stream at a resolution of 2048x1080 with 3 colour channels and a 12bit precision per channel.
Decoding on the CPU (Intel Xeon X5570) using the OpenJPEG library takes about 700ms per frame and I need to get that down to at least 40ms.
While there’re quite a few papers and open source projects out there that focus on GPU accelerated JPEG 2000 en-coding, I couldn’t find anything on de-coding it. Even the commercial Kakadu JPEG 2000 SDK doesn’t seem to support GPU acceleration.
It also seems that (at least when using the OpenJPEG libraries) most of the time is NOT being spent doing the inverse discrete wavelet transform (IDWT) but rather on the decoding of the entropy encoded bit stream.
Before I embark on trying to implement something myself, does anyone know of a reason why this hasn’t been done before? (And if it has, could someone please point me in the right direction?)
Or is there some inherent property of the JPEG 2000 arithmetic coder that prevents parallelisation? (Maybe EN-coding was just the sexier research project External Image )
I am not too familiar with JPEG 2000, however I’ve done some research on draft versions of H.264 in the past (including the arithmetic coding)
In general, the state of an arithmetic coder changes with each decoded symbol. So it’s inherently serial. There might be a chance to parallelize decoding across independently coded elements of the picture (if there are any such independently decodable streams).
Maybe parallelizing across multiple frames on one GPU could also be an option. Why decode frame by frame External Image