Accelerating PNG-Compression using CUDA possible?

Calab · November 9, 2007, 3:59pm

Someone experience with this topic yet? We’re developing a web server application in which we accelerate the rendering already with 7900 GTX cards. How ever about 70% of the graphic generation time is currently spent in the png-compression routines using the png/z-lib.

I unfortunately didn’t find the time to start experimenting with CUDA yet and as far as I’ve informed myself till now CUDA promises a very big performance increase for simple functions such as transformations etc…

So my question:
Is it possible to write a CUDA based PNG compression algorithm to reduce our current time of about 5-10ms (dependent on complexity) for a 256x256 big graphic to lets say 1ms?

Greetings
Calab

asadafag · November 10, 2007, 2:37am

There may be a small possibility. However, you may need to spend a lot time to make it happen.
If you can afford to use other compression schemes, I guess jpeg or bzip2 may be easier.

Calab · November 12, 2007, 7:30am

Thanks for your fast response. As you already proposed I also believe that there’s at least for JPGs a big potential as the compression is blockwhise in the first step and using lots of float operations as well. Unfortunately JPGs would blur our texts too much, see Map.

And the compressed image has to be IE6 compatible, so a 32-Bit PNG with alpha channel.

I can imagine that there’s a little potential in the line comparison part of the PNG compression, but I think the far more hurting part is the deflate step which offers very less parallelization possibilities I guess.

Someone any idea ?

Greetings
Calab

wumpus · November 12, 2007, 2:06pm

So PNG uses nothing like a DCT, wavelet transform or other paralellizable full-image transform? It just pushes the image through zlib?

If compressing one image is not sufficiently paralellizable, one idea might be (if applicable in your case) to use the hardware to compress multiple images at once.

seibert · November 12, 2007, 3:08pm

A potentially limiting factor here will be how much scratch space zlib requires per compression stream. The default memory footprint (according to http://www.zlib.net/zlib_tech.html) is 256 kB, which won’t even fit in the shared memory. The zlib parameters can be tuned to reduce the temporary memory required, or the scratch space could be stored in global memory. Since memory access to the scratch space is unlikely to be in any coalesced pattern, the performance hit for using global memory could be pretty big.

It really does sound like CUDA is a poor fit for PNG compression.

asadafag · November 13, 2007, 2:33am

Well, I think any improvement would only be achievable via a completely redesigned parallel algorithm for DEFLATE, not just implementing zlib.
For IE6, maybe you could write a bzip decompresser in JavaScript or something?

wumpus · November 13, 2007, 3:18pm

but zlib is already incredibly cheap on the CPU… is that really the bottleneck?

Calab · November 18, 2007, 11:13am

Yes, it unfortunately is. To your question “It just pushes the image through zlib?”:

No, it simplifies the data before through comparing the lines with a couple of relative simple algorithms to each other from top to bottom… linewhise. This simplified data is then compressed using deflate. The required time scales from lets say 5 to 15ms dependent on the compression level you set in the zlib… for the same image.

Because of this I think that the most time is spent in the function searching for pixel-repetitions… so looking in the “dicitionary”. The bigger the dictionary is the more time is spent but the better the compression is because of more hits.

So one way I can imagine would be to cut the image into sub images… lets say sixteen 256x16 big images… and this parallel for all 4 channels… so using 64 threads to build 64 indpenedent dictionaries using CUDA to seek these repetitions. And at the end you merge these dictionaries for each channel, build the huffman tree etc…

I guess the time till x-mas will be a bit more quiet… so time to play a bit with the zlib… I’ll let you know when I found a way.

Greetings

Calab

Topic		Replies	Views
About PNG Compression On CUDA CUDA Programming and Performance	1	3428	November 4, 2016
Call for ideas for CUDA Image Library CUDA Programming and Performance	18	18872	May 20, 2013
Ultimate Data Compression Algorithm PAQ on CUDA CUDA Programming and Performance	8	5273	December 9, 2008
Starting a compression algorithm powered by CUDA any help is appreciated CUDA Programming and Performance	9	11040	April 9, 2010
lossless JPEG fast decompression on CUDA CUDA Programming and Performance	7	16879	May 3, 2012
Data Compression on GeForce CUDA Programming and Performance	9	9107	May 5, 2012
100x Resampling->Sobel->Histogram over Angles: How do get Cuda to do it fast? CUDA Programming and Performance	6	10386	December 20, 2008
Slow down with multiple CUDA files CUDA Programming and Performance	8	4715	September 7, 2010
Optimizing Image Labeling Connected Component Labeling CUDA Programming and Performance	8	12574	December 9, 2009
Any performance data on using parallelizable parts of jpeg compression CUDA Programming and Performance	0	687	September 24, 2010

Accelerating PNG-Compression using CUDA possible?

Related Topics