LZW Data compression on CUDA

shyni · April 2, 2013, 6:15am

I am doing mtech in computer science and engineering.As part of my course a thesis work have to be done.I decided to analyse the performance of LZW data compression on GPU and CPU architecture.I installed GPU ocelot in ubuntu 11.04 successfully and doing further steps.Any body can help me to give any idea about this work?Or any suggestion?.Please reply soon.Thank you.

njuffa · April 2, 2013, 6:39am

What kind of analysis do you plan to perform? What role does Ocelot play in it, as opposed to running on an actual GPU, or are you planning to do both? Have you performed a literature search (e.g. via Google Scholar) to see what is already available in terms of LZW implementations on GPUs, multicore CPUs or other parallel devices?

vvolkov · April 2, 2013, 6:51am

I recall a student of Prof. John Owens (UC Davis) was giving a talk on data compression on GPU at GTC last year.

shyni · April 2, 2013, 7:06am

Thank you.I have done a literature survey.Several pattern matching algorithms are implemented in CUDA Architecture,Dictionary based algorithms are not implemented yet.I think LZW will take more memory than pattern matching but the performance will be better.can I proceed with this?If so is there any guidance for me?

njuffa · April 2, 2013, 9:50am

It’s not clear what kind of guidance you are looking for. The purpose of a thesis is typically that the candidate demonstrates the capability of working on a scientific topic independently, with the school providing a thesis adviser who helps in defining the deliverables and provides some general guidance.

I am not sure I agree with your premise that “Dictionary based algorithms are not implemented yet”. While I admittedly know very little about compression I easily found a paper “CULZSS: LZSS Lossless Data Compression on CUDA”, which appears to describe a dictionary-based compression mechanism.

shyni · April 2, 2013, 5:29pm

Thank you for your reply…
sir,I have selected CULZSS paper for my seminar,this is interesting paper…but LZSS is a pattern matching algorithm or sliding window algorithm.I just need to know that is there any difficulties to implement this paper on CUDA,how can analyse the performance,how to view etc…

vacaloca · April 2, 2013, 8:57pm

That’s all up to you… find the sort of operations you can do in parallel from the original code/algorithm. There are plenty of CUDA tutorials online that will help explain all the features, not to mention the CUDA literature included with the SDK direct from NVIDIA itself. You can analyze performance via benchmarking against CPU code as a very simple test. As far as performance analysis NSight for Eclipse (Unix/Mac) or NSight for Visual Studio (Windows) will give you statistics on how your code performs on GPUs.

In general, I agree with njuffa – the purpose of graduate work is to show you are able to (pretty much) independently solve whatever problem… with that said, you need to do a bit more homework on your own rather than expecting answers.

Gregory_Diamos · April 3, 2013, 5:11pm

A word of caution here. Compression has been one of the most difficult problems to port to CUDA. There have been a lot of failed attempts to do this by very well qualified people.

Start here, AFAIK this is the best published work so far: “Parallel Suffix Array and Least Common Prefix for the GPU”, by Mrinal Deo (Advanced Micro Devices), Sean Keely (Advanced Micro Devices). PPoPP 2013. Take it as a starting point rather than a final answer. There has been some more recent work on different algorithms that beat this, but nothing published yet.

Start with a comprehensive literature survey. There is a huge body of work on compression, with significant improvements being made over the last 30 years. Make sure that you know the best published algorithms on each of the following topics (variable length coding, suffix arrays and longest common prefix arrays, run-length encoding).

 More recent work like LZMA achieves better compression ratios than older algorithms like BWT+MTF or LZW/Deflate.  However, GPU compression work has been most successful at BWT+MTF and similar algorithms because they lean heavily on sorting and there has been significant success at implementing parallel sorting algorithms on GPUs.  LZMA-like algorithms are very hard to parallelize because of the large amount of state used, and I am not aware of any successful GPU implementations.  People seem to neglect dictionary algorithms like LZW because BWT+MTF outperforms them.</li>

Be very careful about the algorithm that you pick, and be willing to adapt aspects of it. Many CPU compression libraries (e.g. bzip2) bake in assumptions that certain operations are performed sequentially. In many cases these assumptions can be relaxed in a way that does not impact compression ratio, but breaks compatibility with existing libraries. Feel free to break these assumptions without any loss in the impact of your work. People will mainly be interested in the compression ratio and speed of a new implementation; it doesn't matter whether you use a well-known approach or something new.
Be careful about your choice of algorithm for variable length coding. Huffman coding is notoriously difficult to do in parallel. There has been some success with arithmetic coding, but this is still an open problem.
Make sure to compare against a production CPU implementation. The amount of time that goes into the development and optimization of a production compression library is far more than you will spend on a masters thesis. Do NOT write it yourself. Writing it yourself completely invalidates your performance results.

Topic		Replies	Views
Starting a compression algorithm powered by CUDA any help is appreciated CUDA Programming and Performance	9	11041	April 9, 2010
Data Compression on GeForce CUDA Programming and Performance	9	9107	May 5, 2012
Run CUDA on CPU while GPU is present CUDA Programming and Performance	5	2961	April 25, 2011
Ocelot 1.0 Alpha Release High Performance GPU and Multi-core CPU targets CUDA Programming and Performance	27	59840	January 1, 2010
LZO realtime compression LZO realtime compression. CUDA Programming and Performance	4	7573	October 27, 2008
Ultimate Data Compression Algorithm PAQ on CUDA CUDA Programming and Performance	8	5274	December 9, 2008
Low Level CUDA C Programming Education CUDA Programming and Performance cuda	2	722	December 13, 2021
bzip on CUDA Is it enough memory? CUDA Programming and Performance	11	8078	March 2, 2010
Cuda encrypted filesystem CUDA Programming and Performance	8	11358	July 3, 2009
Cuda/OpenCL Optimization Find a compromise between time needed to optimization and performance CUDA Programming and Performance	3	4042	May 4, 2010

LZW Data compression on CUDA

Related Topics