"Genetic" image compression using transparent polygons a hot candidate for an implementati

Found this on boingboing today:


Roger Alsing re-created the Mona Lisa from 50 transparent polygons in what he claims to be a genetic algorithm. It took about a million iterations. Some contest this and say it’s just a hillclimbing algorithm - nevertheless the result is outstanding.

The drawing was so far implemented in GDI+, the fitness function that controls the evolutionary selection is based on a pixel by pixel comparison with the source. It took 3 hours on his laptop to compute. I think this algorithm is a hot candidate for an implementation in CUDA - or possibly a combination of CUDA and DirectX or OpenGL.

The autor has published his C# source code, see next posting

A few good samples from his galery

The entire gallery is to be found under this URL:


Sounds neat…I’ll have to take a look at it when he publishes the source code.

Source code is now published.


hack away!


A portable C++ implementation of the same algorithm is here. This might be a better starting
point for a CUDA port. It has no GUI for parameters and uses SDL for rendering polygons.
This compiled fine on my Mac Mini (Intel) using the SDL libraries installed through fink


see the blog post here for more info:


So how many polygons do the resulting images contain? Can they be saved as SVG or another vector format?

What kind of compression ratio are we achieving here? (Must be something extreme)

We’re talking about 50 to 150 polygons.

These tools have no SVG export function yet, but they can save their current progress so computation can be resumed later.

Compression factor is also determined by how you store the resulting vector coordinates. Quantized or floating point, the amount of entropy coding applied to the coordinates (delta compression, huffman compression, arithmetic compression - you name it).

Don’t expect too much compression efficiency. 100 polygons consisting of say 8 vertices on average already make up 800 coordinates hat have to be stored. Add a RGB color and alpha per polygon and you end up with enough data to transmit a similar quality JPEG image of same byte size ;)

If everything is output as XML (SVG) you get a lot of extra overhead (no good compression efficiency really). The coolness of this is the resolution independence: vector data scales nicely.

One problem that I see with this general approach is that the resulting image is dependent on the drawing order of the polygons. During the drawing the RGB pixels can get saturated, which is a nonlinear operator. So the polygons cannot really be treated independently as individual contributors that decrement the error function. If we had a strict linearity, I could think of some optimizations that would speed up the generation of the polygon coordinates. One could for example try to optimize each polygon one by one - reshaping/recoloring it such that it reduces the error function that remains after adding up all other existing polygons.


Well, if we represent each vertex as a vect2 of 10bit ints, you get about 2KB per image. Add in compression, and maybe 1KB? Here are the above pictures saved as JPEG with lowest settings to get 1-2KB size. I wish I could test JPEG2000, but Paint.NET doesn’t support it.

In general, it seems patently obvious that a genetic algorithm, whether one based on polygons or guassians or wavelets or whatnot will be the next step in image compression. An awfully computationally-intensive step, but a step nevertheless.

Here are the vector images resized like the JPEGs. Difference is stunning.

Interesting! This certainly produces nice images, but these are quite small, and already slow; I can see applictions for vectorization in tools like inkscape, but for normal image compression I can think of very few cases in which the extra time investment will be worth it. How would it perform on multi-megapixel images. Or video streams?

With CUDA this could get into performance regions that are acceptable.

Without CUDA it will take several hours of computation to get a decent polygon vector image. This is because the “evolution” of the image starts with random polygons that are mutated randomly and selected according to their similarity with the original image. This takes hundreds of thousands of iterations (generations) and for each iteration the CPU then computes the fitness function.

One complex problem is the drawing of the polygons. Non-convex polygons and polygons with intersecting edges have to be tesselated (broken down into a triangle mesh) in order to be drawn by the GPU. This tesselation is usually done on the CPU and may be quite slow for complex polygons - especially when it has to be done over and over in each iteration. If the algorithm would directly mutate a triangle mesh instead of arbitrary shaped polygons this would be better suited to the GPU hardware.

I am thinking of modifying the algorithm for a CUDA implementation. Maybe using simpler primitives (e.g. individual triangles) or even just lines that separate two distinct color regions. This could then be rendered directly by a CUDA kernel and the error function (“fitness”) might be determined already during the rendering pass. One could even determine that the most recent mutation was “bad” before rendering finishes.


How about we make a contest out of it?

Ha! Nice idea. The question is just how exactly to lay out the rules here.


In terms of algorithm, there’s two ways to go here. Stick to something uniform (like the original polygon+pseudoevolution or some agreed variation, like using triangles) and make it more about optimization, or give complete freedom (any primitive function, any evolution algorithm) and make it more about genetic compression.

To judge, I think we should pick several (resolution,compression_ratio,time) tuples as contest rounds. (Eg, contestants must compress a 2 MP image to 30KB in 60 seconds, plus or minus a few KB or seconds.) Winner is the victor in the plurality of scenarios (but anyone who wins a round can brag). Judging could be objective via error function, or subjective via voting.

I’d say make the first phase of the contest constrained to the uniform scenario, where everyone gets the exactly the same number of degrees of freedom (200 triangles, each with uniform color and alpha, for example).

Once people have gotten familiar with the problem, and winners declared, phase two should be the freestyle round. Use any primitives you like to produce an output representation of the image in fixed size and time. Contestants will have probably learned enough in phase 1 (especially checking out the winning solutions) to make phase 2 really interesting…

Here is some CUDA code (work in progress).

Contrary to Roger Alsing’s approach, I am not using alpha blending - but rather strict linear superposition of the polygons. This means each polygon adds to the resulting image without affecting previous polygon contributions. This linearity will allow for a better optimization of each individual polygon (while leaving the other polygons fixed).

Rendering is done in a CUDA kernel, currently in a 512x512 RGB matrix. Rendering works on convex polygons, each having a signed 8 bit RGB color - meaning a polygon can either add to the color output, or subtract from it. Superposition of polygon RGB colors is currently done in the floating point domain, before an 8 bit RGB clamping is applied for final output. I found working in floating point to be a little faster than working with ints. I intend to compute the error function as part of the rendering process before the RGB clamping. So I can abort rendering early when the error function goes out of bounds (i.e. the mutation was bad).

The current code sample renders 3 triangles (the middle one subtracting from the final output by using negative colors). The sample is based on the “Box Filter” SDK sample and uses OpenGL. It should compile also on Linux if you define TARGET_LINUX instead of TARGET_WIN32 and apply a modified “BoxFilter” Makefile from the CUDA SDK 2.0

Now my problem is that this kernel only does “only” ~900 FPS on my nVidia 9600 GSO when rendering just three polygons. :no: I need some advice how to make this rasterization faster. :fear:

You can still enable the box filter with the + , - and [ , ] keys. It brings down the FPS even more but softens the edges.

UPDATE: The most recent addition to my renderer is that it computes an error metric during rendering (which can be easily converted into a PSNR). The FPS values reported are now much more accurate.


Whoa whoa whoa

I never heard the starting gun fire :mellow:

well, that’s why you get my source code ;)

I don’t want you source. I’m not gonna be abetted by the enemy.

Well, consider Roger Alsing your strongest “opponent” in this unstarted contest. He’s claimed that his current code version now takes around 10 minutes to achieve an image quality that his published C# version crunched 3 hours for.

Ok, silly me. With my current block and grid layout I am only using 2 SMs out of 12 on the card.
(EDIT: this has now been fixed in above posted code)

Also I found some very redundant computations that could be cached in shared memory.

And maybe the expensive floating point division can even be eliminated by using a different criterion
for “pixel is inside a polygon”. I hear the cross product could be used to achieve the same goal.