using compressed data to minimize PCI bus transfer time

nuliknol · June 21, 2009, 1:32am

Hi,
did someone tried to compress the data before sending it from CPU to GPU, and then uncompress it on the GPU before processing by the application? Maybe this way we can reduce the time data spends traveling on the PCI bus.

Thanks for any comment.

Simon_Green · June 22, 2009, 12:48pm

This kind of depends on what kind of data you’re transferring, not everything is compressible. Doing entropy compression/decompression (e.g. zip) on the GPU is difficult, but people have successfully implemented various image and video compression algorithms on the GPU, for example:

[url=“http://www.cs.rug.nl/~wladimir/sc-cuda/”]http://www.cs.rug.nl/~wladimir/sc-cuda/[/url]

You could certainly compress data by using smaller data types and packing bits.

nuliknol · June 22, 2009, 4:16pm

yes, i figured out that it has to be very application specific, and very simple type compression, and i have already achieved around 40% reduction in data by packing bits.

CUDALIKE · May 29, 2011, 8:29pm

There is a view?

http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter33.html

And this algorithm is suitable for compression of other types of data such as text?

Or may have any compression algorithms implemented on the GPU?

cbuchner1 · May 29, 2011, 8:36pm

In our application (radio simulation) we can quantize some the data we send to the graphics chip. We found that using an 8 bit quantization is sufficient for what we do. Going from float to unsigned char saves us a factor 4 in transfer volume.

Dequantization on the graphics chip is essentially a free operation for us, as our application is limited by memory bandwidth and so a few additional FLOPs inbetween memory transfers don’t add any extra time.

njuffa · May 29, 2011, 8:47pm

It is not clear wether your use case requires optimization of the latency or the throughput of the PCIe transfers. If it’s primarily a throughput issue, I would first look into optimizing the transfers per se, for example by making use of the dual DMA engines in a Tesla card for maximum overlap between copies and kernels as well as up-copy / down-copy. In addition, I would look at a double-buffering scheme. If this proves to be insufficient, these paper may provide some inspiration on how to go about data compression on the GPU:

Molly A. Oâ€™Neil and Martin Burtscher
Floating-point data compression at 75 Gb/s on a GPU
Fourth Workshop on General Purpose Processing on Graphics Processing Units. March 2011

Wenbin Fang, Bingsheng He, Qiong Luo
Database compression on graphics processors
Proceedings of The Vldb Endowment - PVLDB, vol. 3, no. 1, pp. 670-680, 2010