"Genetic" image compression using transparent polygons a hot candidate for an implementati

Depending on how well your polygon data is fitting into the constant cache, you might also try this layout in memory:

x1, y1, (y2 - y1)/(x2 - x1), x2, y2, (y3-y2)/(x3-x2), …

If everything fits in the 6-8 kB cache, then this will be much faster. Otherwise, hitting global memory to read in the slope might be slower that computing it every time.

Another option would be to start the kernel by having the threads in the block cooperatively populate a shared memory array just for the slope values. (again, assuming they all will fit.)

A third option would be to switch to triangles rather than convex polygons, in which case, all of this becomes much simpler. :)

I’ve just tried something similar. I am now storing the polygon edges in the form of axis intercept and slope. After a lot of experimentation, I conclude that this method provides a reasonable speed boost (1450 FPS vs. 1070 FPS for three polygons).

Going to high polygon counts produced horrible frame rates. 90 FPS only for rendering 129 triangles - because each pixel had to be checked against 387 edges - ouch!

So I had to extended this into a tile based rendering scheme, where I form subsets of polygons and edges that will affect individual tiles that the output image is comprised of. This brings down the number of edge checks for each screen pixel to reasonable levels. I tried this with the same 129 polygons and got close to 400 FPS - that’s good enough for me. Computing and storing these edge subsets turned out to be slower than the actual rendering process by the way: with the additional constraint that everything has to fit into the 64kb of constant memory.

I will now start with the evolutionary code. I don’t think I will be posting source code from now on - this is sort of a competition after all. ;)

Christian

Hehe, I’ve now got a fully working MPI cluster version ;-)

The current speed record is 3 min 12 sec for the same quality as the last image in the Mona Lisa series.

That was using two cores on my laptop.

A friend of mine is putting up a real MPI cluster right now for us to test on.

I’m going to pew pew you 3D guys so hard ;-)

And it’s really nice to see all the spin-off effects that have come out of this.

Regarding using solid polygons:

You will never get the same image quality using solid polygons.

The semi transparent effect enables the evolution to exploit and utilize the overlay effects to create gradients and the apperance of more polygons.

So you will never be able to reach the same fitness level with the same amout of polygons as if you use semi transparent ones.

//Roger

Wait until we start clustering our GPUs.

Nice that you found your way here ;)

My CUDA code uses linear superposition, comparable to shining multiple polygon shaped light sources onto a screen. Overlap a red, a green and a blue source and you get white. One speciality in what I did is that I allow negative color intensities as well. Almost like a flash light that shines a black light cone in a sunlit environment. For solid polygons you´d have to look elsewhere - not in my code. hehe.

BTW I chose the linear approach because I am a linear systems guy (communications engineering).

Christian

as the discussion is heating up a little bit, i think it’s time for me to present my last few days of work on this:
i’ve decided to abandon the polygons and instead go with triangles. (as also suggested by some of the posters here) but as you would need lots of semi-transparent triangles to achieve any quality, i decided to put a little more data into each triangle. (i won’t tell you exactly what at the moment, as this data still changes quite often, but if you look at the image, you should be able to see in what direction this is going ;-))
at the moment, i’m optimizing the algorithm for maximum image quality with minimum size and don’t care that much for computation time. (that’s why i won’t talk about that either, let’s just say, at the moment it’s much too long on a cpu :-p)

now for the results:
my first test is on a compression factor of about 170:1, that is 707 Bytes of data for mona lisa.
first, just for the laughs, this is what jpeg will achieve with this compression: (i had to downsample it before the compression to even get so far down)
External Media
that’s a MSE (mean squared error) of 789.9 or a PSNR (peak signal-to-noise-ratio) of 19.15 dB.

now to the real competitor, JPEG2000:
with the same size as the above jpeg, you’ll get the following.
External Media
MSE: 171.4, PSNR: 25.79 dB

if you choose to only use triangles in roger’s algorithm and limit them to 44, you get 704 bytes of raw data, if you save rgba as 41=4 byte and each coordinate (x,y) as 22=4 byte, i.e. 16 byte per triangle.
the result looks like the following:
External Media
MSE: 482.8, PSNR: 21.29 dB

now here is roger’s algorithm with 18 polygons and 9 coordinates per polygon, which is, using the same data format as above, 720 bytes of raw data.
External Media
MSE: 488.1, PSNR: 21.25 dB

well, as you can see, roger’s approach is quite a bit from being comparable to jpeg200 but there definately is a potential.

so, here is what i get at the moment with 27 triangles and 702 byte of raw data (similar data format as the above, just some additional parameters for the triangles):
External Media
MSE: 203.7, PSNR: 25.04 dB

as you can see, there is still some way to go to reach jpeg2000’s quality but you also see the advantages of such an approach: you have vector data an no blocking or ringing artifacts, which makes an upsacaling a lot easier.

most likely, my next step will be to use a different quality measure as fitness function, because PSNR or MSE isn’t that great in this respect. i’m thinking of something like SSIM (http://www.ece.uwaterloo.ca/~z70wang/research/ssim/) but if anyone has a better idea, let me know it. ;-)
also, if there are some better performing compressions than jp2k out there, i’d love to hear about them. :-)

so far from me, no cuda this time but that will come once the algorithm itself stands.

I applaud your efforts. Very impressive images so far.

One thing that we also could attempt is a variable length coding of the data. For examples one could encode

polygon coordinates with low precision (taking up just a few bits per coordinate).

This will allows to balance the improvement in PSNR vs. the bit rate it consumes to encode a polygon. This is also called “Rate-distortion optimization”.

Especially the first few polygons could be encoded very coarsely because they are mostly space fillers, providing background brightness and coarse outlines.

Christian

yes, you definately can get some additional compression from the coordinates. mostly in this image size you’ll only use 8 of the 16 bit i calculated for each x and y, which would give you like another 162 bytes and you could pack 35 triangles into 700 byte with my approach, which would result in a mse/psnr comparable to jp2k.

You should throw some static tree huffman compression onto that data too, it might get the size down a bit.

We need the best fitness function possible. SSIM looks interesting, but I’m not convinced. In the albert einstein examples, it seems to rather not care about the stark jpeg artifacts (vs the blurred version) that a human would find objectionable. (Of course, it’s still miles better than the MSE.) Anyone have other suggestions?

And then, of course, we’ll need to port SSIM to CUDA. I’m wondering if that should be part of the contest, or if we should cooperate on that one. (Frankly, I don’t know the Matlab programming language it’s written in.)

take a look at this code: [url=“http://mehdi.rabah.free.fr/SSIM/SSIM.cpp”]http://mehdi.rabah.free.fr/SSIM/SSIM.cpp[/url]
that seems to be pretty easy.

another possibility for a fitness function is this: http://citeseerx.ist.psu.edu/viewdoc/summa…=10.1.1.49.6079 but i haven’t looked too deep into that one yet.

i have done a first test with SSIM…
when looking just at the numbers, i thought, i would see some pretty nice result.
my algorithm reached a similarity of 58.47 % with 702 bytes of data, whereas a jpeg2000 with 624 bytes of data reached 58.36 %.
i’ve added both pics below. what you should see, according to these numbers, is a slightly higher visual quality in my triangle version (left) than in the jpeg2000 version (right).
External MediaExternal Media
but, as you can see, the triangle-version isn’t even remotely similar to mona lisa. :-(

this can have several reasons:

  • my implementation (based on Mehdi Rabah’s but without openCV) is wrong
  • Mehdi Rabah’s implementation has an error greater than the 1 % stated in the source
  • I just discovered a weakness of SSIM

i have already tested my algorithm on the einstein examples given on the ssim page and got plausible results (within the 1 % error described in Mehdi’s source).
as i don’t have matlab available here, it would be very nice, if someone could double-check my results with the original matlab code. my “original” mona lisa can be found here.
thanks in advance.

Too bad the experiments they did to test their scheme were bogus. Basic sanity checks. They spent all this work thinking up their algorithm and writing a paper, but have no idea how well it works.

We’ll have to do some testing ourselves.

The genetic algorithms really hit these error-assessment algorithms in the worst possible way. They zero in on the exact shortcomings and biases.

Yes, this was my experience several years ago when I was doing genetic optimization of a model with many parameters. The algorithm always gave me what I asked for, which was usually the problem. :) I ended up spending all my time trying to find a good similarity metric.

yeah, it’s usual to find such things with evolutionary algorithms…
btw: it seems, the algorithm not always finds the worst possible representation of the image, i.e. the quality varies between “hmm, could be mona lisa with a little bit of modern touch” and a painting by van gogh after he had too much absinth. ;-)
i think i’ll stick with ssim for now and try to avoid the “bad” ones…

Here is some CUDA based eyecandy for you. This 1MB download might be worth checking out. External Image WIN32 Binary only, no source code. UPDATE: missing cudart.dll has been added.

The output is just quite beautiful. External Image Use the +/- keys and [/] keys to add the box filter if needed. Smoke some weed or throw some LSD while watching this - it’s quite psychedelic. External Image

It is a tile based renderer (32x32 pixel tiles) on a 512x512 pixel bitmap, showing 129 mutant triangles. Polygon edge sorting (into tiles) and rendering is done in a CUDA kernel. I am sure rendering in pure OpenGL or DirectX using a shading language would have been way faster, but I am more familiar with CUDA - so I stick with it.

Currently no meaningful evolution of the image takes place. What you see is a random modification of 129 triangles (vertices and colors) using normally distributed random numbers. On a nVidia 9600 GSO this results in close to 600 frames per second of which every tenth is actually rendered to screen. I am quite satisfied with the performance of my CUDA based polygon renderer now. Keep in mind that the renderer already evaluates the fitness function during rendering.

One benefit of my tiled approach is that I could evaluate the fitness of a mutation by rendering just the affected tiles. Or I could perform simultaneous mutations of polygons that affects several distinct tiles, still allowing me to evaluate and select the mutations separately.

This is going to pew Roger Alsing’s MPI multicore Mona Lisa hard, once it’s finished ;) External Image

Christian

Suggested compromise: How about you evolve according to PSNR metric until the SSIM metric meets a given threshold?
Then you get the best of both worlds

Christian

i don’t think that this will work, as psnr and ssim are completely independend…
i also thought about combining psnr with gradients…

btw: which version of cuda are you using? cudart.dll is missing in the files and he doesn’t want to use mine :-p

I use the 32 bit CUDA 2.0 runtime. I fixed that download archive by adding the required DLL.

uh, pretty wobbly ;-)

unfortunately, i’m only getting 180 fps with my 8800gt 1gb on vista x64. i’ll try a gtx260 tomorrow…

Strange… I’d have expected some FPS in excess of 600.

Maybe there’s just something wrong with the way this code measures the FPS.

I am on Windows XP 64 bit by the way.

Christian