Ana Balevic currently specializes in parallel programming. You might find her online resources about parallel implementation of data compression algorithms highly interesting. I found this link on the thrust-users mailing list and decided to share it with you.
Modern entropy coding like in JPEG2000 or H264 is done by a arithmetic coding algorithms. In the Resources only a very simple arithmetic encoder is presented which is only partly parellelized but no decoder.
In the JPEG2000 implementation CUJ2K the big big bottleneck is the entropy coder.
But the biggest problem in parallel implementations of data compression is the adaptiveness. The greatest performance gains are achieve due to context adaptive algorithms. It is not possible to “share” this adaptivness between parallel threads.
So you have to group your data in blocks. The blocks should be as big as possible to achieve good performance in coding gain. So you need in high parallel scenarios many many big blocks. Thus a big amount of data for efficient parallel data compression is necessary.