Ultimate Data Compression Algorithm PAQ on CUDA

The PAQ compression algorithm (read here) is by far the best algorithm for compressing files. However, it is also one of the slowest, taking well over 10 hours to compress 1GB on even the best CPU’s. Can this algorithm be sped up using CUDA? (It is open-source, BTW).

Side question: Will there ever be a Java compiler (or an API/SDK) that creates CUDA-compatible executables (.jar files are OK as long as they work with CUDA)?

I think this is how it works:

  1. You fully understand the algorithm that you want to implement on CUDA hardware
  2. You know (or learn) about the CUDA programming model and hardware

and from 1+2) you can devise whether or not you can expect any speedups from using CUDA.
So learn about PAQ and also about CUDA.

My take is that general binary data compression algorithms hardly maps to CUDA because this is fundamentally a very sequential operation with a lot of data dependencies. Also the shared memory on the chip is a definitely too small to work with large data sets efficiently.

Video and image compression/decompression benefit from CUDA because the required transformations can be accelerated greatly by CUDA (for JPEG, JPEG2000, etc), as well as things like motion vector search and inloop filtering (for H.264 and MPEG4 for example). Also for decompression there’s plenty of dedicated hardware on the GPUs.

The CUDA acceleration for image and video compression standards benefits from the fact that the source data is partitioned into smaller blocks for transformation and processing (e.g. macroblocks). These can be shoved into shared memory and be processed on several multiprocessors in parallel.

Check if PAQ also does some blockwise operations.

Christian

The C++ source code for PAQ 8P is included in this zip file. Please look over it and tell me if it could be improved using CUDA.

That´s funny - I suggest that you study the algorithm and your response is asking me to do the hard work for you

The problem is, I have a day job already. Maybe someone else will help, but it’s definitely not going to be me.

Christian

StartBP,

I dont know anything about compression…

BUt can this compression technique be broken down into number of sub-compressions which can be stitched together later?

if so, it can be parallelized…

If not, you need to find a way to parallelize it. Certain sequential algos can just not be parallelized…

like for example:

for(i=1; i< 100;i++)

a[i] = a[i] + a[i-1]; // cumulative addition…

So read the algo yourself and see if it can be parallized. Good Luck

actualy there is a parallel solution for what u wrote, a very good one, have a look at the scan example and the cudpp. I use these techniques in a number of places. It’s true that the GPU isn’t good for every kind of problem. But in most cases i found that it might be hard, require more time and a rewrite of the algorithm. but in the end you will get a performance boost, maybe not x100 but can defiantly be x10. So good luck !!! :)

Thank you. The reason that I asked about this instead of doing it myself is that I’m more of a Java programmer, and I’m even just beginning at that (Hello, World! [OK, a little more than that, but not much; I’m still in the Sun Tutorial]). Before you go all “YOU DON’T EVEN PROGRAM?!?!?!” on me, you should look at my greatest accomplishment. It was made in Excel and can be found at starcalc.110mb.com. I know it’s a Pokémon calculator, but it’s a long overdue one. It’s also the reason that I’m learning a real programming language, so I can make it in Java sometime in 2009 or 2010. Is C++ similar enough to Java so that I wouldn’t shoot myself in the foot if I tried to modify the code for parallelism after I have mastered Java (around the end of next year)? Also, is CUDA going to come out with a Java SDK and IDE (or Sun come out with a CUDA extension)? I would really like to try my hand at CUDA, and Java is what I do best (as of 2010).

You can call C code from Java, so you can call CUDA from Java. I think there is about 0% chance that Java code can ever be used in kernels though.

I don’t want to discourage, but you have a few big hurdles to cross, but if you have the perseverance then go for it. I enjoy my daily work with CUDA very much.