Ultimate Data Compression Algorithm PAQ on CUDA

StarBP · December 7, 2008, 12:58am

The PAQ compression algorithm (read here) is by far the best algorithm for compressing files. However, it is also one of the slowest, taking well over 10 hours to compress 1GB on even the best CPU’s. Can this algorithm be sped up using CUDA? (It is open-source, BTW).

Side question: Will there ever be a Java compiler (or an API/SDK) that creates CUDA-compatible executables (.jar files are OK as long as they work with CUDA)?

cbuchner1 · December 7, 2008, 4:05pm

I think this is how it works:

You fully understand the algorithm that you want to implement on CUDA hardware
You know (or learn) about the CUDA programming model and hardware

and from 1+2) you can devise whether or not you can expect any speedups from using CUDA.
So learn about PAQ and also about CUDA.

My take is that general binary data compression algorithms hardly maps to CUDA because this is fundamentally a very sequential operation with a lot of data dependencies. Also the shared memory on the chip is a definitely too small to work with large data sets efficiently.

Video and image compression/decompression benefit from CUDA because the required transformations can be accelerated greatly by CUDA (for JPEG, JPEG2000, etc), as well as things like motion vector search and inloop filtering (for H.264 and MPEG4 for example). Also for decompression there’s plenty of dedicated hardware on the GPUs.

The CUDA acceleration for image and video compression standards benefits from the fact that the source data is partitioned into smaller blocks for transformation and processing (e.g. macroblocks). These can be shoved into shared memory and be processed on several multiprocessors in parallel.

Check if PAQ also does some blockwise operations.

Christian

StarBP · December 7, 2008, 5:25pm

I think this is how it works:

You fully understand the algorithm that you want to implement on CUDA hardware

You know (or learn) about the CUDA programming model and hardware

and from 1+2) you can devise whether or not you can expect any speedups from using CUDA.

So learn about PAQ and also about CUDA.

My take is that general binary data compression algorithms hardly maps to CUDA because this is fundamentally a very sequential operation with a lot of data dependencies. Also the shared memory on the chip is a definitely too small to work with large data sets efficiently.

Video and image compression/decompression benefit from CUDA because the required transformations can be accelerated greatly by CUDA (for JPEG, JPEG2000, etc), as well as things like motion vector search and inloop filtering (for H.264 and MPEG4 for example). Also for decompression there’s plenty of dedicated hardware on the GPUs.

The CUDA acceleration for image and video compression standards benefits from the fact that the source data is partitioned into smaller blocks for transformation and processing (e.g. macroblocks). These can be shoved into shared memory and be processed on several multiprocessors in parallel.

Check if PAQ also does some blockwise operations.

Christian

The C++ source code for PAQ 8P is included in this zip file. Please look over it and tell me if it could be improved using CUDA.

cbuchner1 · December 7, 2008, 9:29pm

ThatÂ´s funny - I suggest that you study the algorithm and your response is asking me to do the hard work for you

The problem is, I have a day job already. Maybe someone else will help, but it’s definitely not going to be me.

Christian

Sarnath · December 8, 2008, 9:10am

StartBP,

I dont know anything about compression…

BUt can this compression technique be broken down into number of sub-compressions which can be stitched together later?

if so, it can be parallelized…

If not, you need to find a way to parallelize it. Certain sequential algos can just not be parallelized…

like for example:

for(i=1; i< 100;i++)

a[i] = a[i] + a[i-1]; // cumulative addition…

So read the algo yourself and see if it can be parallized. Good Luck

erdooom · December 8, 2008, 11:54am

actualy there is a parallel solution for what u wrote, a very good one, have a look at the scan example and the cudpp. I use these techniques in a number of places. It’s true that the GPU isn’t good for every kind of problem. But in most cases i found that it might be hard, require more time and a rewrite of the algorithm. but in the end you will get a performance boost, maybe not x100 but can defiantly be x10. So good luck !!! :)

StarBP · December 9, 2008, 6:38pm

Thank you. The reason that I asked about this instead of doing it myself is that I’m more of a Java programmer, and I’m even just beginning at that (Hello, World! [OK, a little more than that, but not much; I’m still in the Sun Tutorial]). Before you go all “YOU DON’T EVEN PROGRAM?!?!?!” on me, you should look at my greatest accomplishment. It was made in Excel and can be found at starcalc.110mb.com. I know it’s a PokÃ©mon calculator, but it’s a long overdue one. It’s also the reason that I’m learning a real programming language, so I can make it in Java sometime in 2009 or 2010. Is C++ similar enough to Java so that I wouldn’t shoot myself in the foot if I tried to modify the code for parallelism after I have mastered Java (around the end of next year)? Also, is CUDA going to come out with a Java SDK and IDE (or Sun come out with a CUDA extension)? I would really like to try my hand at CUDA, and Java is what I do best (as of 2010).

E.D_Riedijk · December 9, 2008, 7:21pm

Thank you. The reason that I asked about this instead of doing it myself is that I’m more of a Java programmer, and I’m even just beginning at that (Hello, World! [OK, a little more than that, but not much; I’m still in the Sun Tutorial]). Before you go all “YOU DON’T EVEN PROGRAM?!?!?!” on me, you should look at my greatest accomplishment. It was made in Excel and can be found at starcalc.110mb.com. I know it’s a PokÃ©mon calculator, but it’s a long overdue one. It’s also the reason that I’m learning a real programming language, so I can make it in Java sometime in 2009 or 2010. Is C++ similar enough to Java so that I wouldn’t shoot myself in the foot if I tried to modify the code for parallelism after I have mastered Java (around the end of next year)? Also, is CUDA going to come out with a Java SDK and IDE (or Sun come out with a CUDA extension)? I would really like to try my hand at CUDA, and Java is what I do best (as of 2010).

You can call C code from Java, so you can call CUDA from Java. I think there is about 0% chance that Java code can ever be used in kernels though.

erdooom · December 9, 2008, 9:56pm

I don’t want to discourage, but you have a few big hurdles to cross, but if you have the perseverance then go for it. I enjoy my daily work with CUDA very much.

Topic		Replies	Views
A (not so) hypothetical question CUDA Programming and Performance	6	1640	March 24, 2009
Starting a compression algorithm powered by CUDA any help is appreciated CUDA Programming and Performance	9	11051	April 9, 2010
What are you guys doing with cuda? just wanna find a way to go CUDA Programming and Performance	81	56089	February 7, 2013
Suggestion for CUDA CUDA Programming and Performance	7	4284	April 29, 2008
well how do I know if cuda runs on the gpu CUDA Programming and Performance	20	13296	July 9, 2008
CUDA version of visiting all subset combinations CUDA Programming and Performance	8	2760	July 9, 2013
CUDA Benchmark Suite Suggest Apps CUDA Programming and Performance	15	29942	July 3, 2009
CUDA vs Intel IPP in signal processing CUDA Programming and Performance	10	23383	May 22, 2024
cuda for ati cards we need a stadard CUDA Programming and Performance	27	43373	October 3, 2008
CUDA and fixed-point comparaison on big array Is CUDA suitable for fixed-point comparaison? CUDA Programming and Performance	7	2494	May 9, 2011

Ultimate Data Compression Algorithm PAQ on CUDA

Related topics