LZW Data compression on CUDA

I am doing mtech in computer science and engineering.As part of my course a thesis work have to be done.I decided to analyse the performance of LZW data compression on GPU and CPU architecture.I installed GPU ocelot in ubuntu 11.04 successfully and doing further steps.Any body can help me to give any idea about this work?Or any suggestion?.Please reply soon.Thank you.

What kind of analysis do you plan to perform? What role does Ocelot play in it, as opposed to running on an actual GPU, or are you planning to do both? Have you performed a literature search (e.g. via Google Scholar) to see what is already available in terms of LZW implementations on GPUs, multicore CPUs or other parallel devices?

I recall a student of Prof. John Owens (UC Davis) was giving a talk on data compression on GPU at GTC last year.

Thank you.I have done a literature survey.Several pattern matching algorithms are implemented in CUDA Architecture,Dictionary based algorithms are not implemented yet.I think LZW will take more memory than pattern matching but the performance will be better.can I proceed with this?If so is there any guidance for me?

It’s not clear what kind of guidance you are looking for. The purpose of a thesis is typically that the candidate demonstrates the capability of working on a scientific topic independently, with the school providing a thesis adviser who helps in defining the deliverables and provides some general guidance.

I am not sure I agree with your premise that “Dictionary based algorithms are not implemented yet”. While I admittedly know very little about compression I easily found a paper “CULZSS: LZSS Lossless Data Compression on CUDA”, which appears to describe a dictionary-based compression mechanism.

Thank you for your reply…
sir,I have selected CULZSS paper for my seminar,this is interesting paper…but LZSS is a pattern matching algorithm or sliding window algorithm.I just need to know that is there any difficulties to implement this paper on CUDA,how can analyse the performance,how to view etc…

That’s all up to you… find the sort of operations you can do in parallel from the original code/algorithm. There are plenty of CUDA tutorials online that will help explain all the features, not to mention the CUDA literature included with the SDK direct from NVIDIA itself. You can analyze performance via benchmarking against CPU code as a very simple test. As far as performance analysis NSight for Eclipse (Unix/Mac) or NSight for Visual Studio (Windows) will give you statistics on how your code performs on GPUs.

In general, I agree with njuffa – the purpose of graduate work is to show you are able to (pretty much) independently solve whatever problem… with that said, you need to do a bit more homework on your own rather than expecting answers.

A word of caution here. Compression has been one of the most difficult problems to port to CUDA. There have been a lot of failed attempts to do this by very well qualified people.

Start here, AFAIK this is the best published work so far: “Parallel Suffix Array and Least Common Prefix for the GPU”, by Mrinal Deo (Advanced Micro Devices), Sean Keely (Advanced Micro Devices). PPoPP 2013. Take it as a starting point rather than a final answer. There has been some more recent work on different algorithms that beat this, but nothing published yet.

    Here are some notes from prior attempts:
  1. Start with a comprehensive literature survey. There is a huge body of work on compression, with significant improvements being made over the last 30 years. Make sure that you know the best published algorithms on each of the following topics (variable length coding, suffix arrays and longest common prefix arrays, run-length encoding).
     More recent work like LZMA achieves better compression ratios than older algorithms like BWT+MTF or LZW/Deflate.  However, GPU compression work has been most successful at BWT+MTF and similar algorithms because they lean heavily on sorting and there has been significant success at implementing parallel sorting algorithms on GPUs.  LZMA-like algorithms are very hard to parallelize because of the large amount of state used, and I am not aware of any successful GPU implementations.  People seem to neglect dictionary algorithms like LZW because BWT+MTF outperforms them.</li>
  2. Be very careful about the algorithm that you pick, and be willing to adapt aspects of it. Many CPU compression libraries (e.g. bzip2) bake in assumptions that certain operations are performed sequentially. In many cases these assumptions can be relaxed in a way that does not impact compression ratio, but breaks compatibility with existing libraries. Feel free to break these assumptions without any loss in the impact of your work. People will mainly be interested in the compression ratio and speed of a new implementation; it doesn't matter whether you use a well-known approach or something new.
  3. Be careful about your choice of algorithm for variable length coding. Huffman coding is notoriously difficult to do in parallel. There has been some success with arithmetic coding, but this is still an open problem.
  4. Make sure to compare against a production CPU implementation. The amount of time that goes into the development and optimization of a production compression library is far more than you will spend on a masters thesis. Do NOT write it yourself. Writing it yourself completely invalidates your performance results.