with a gtx260 on xp x64 i get 235 fps.
can it be that there are lots of host-device memCpys? because that’s the only thing that i can think of that would cause such a difference… (both my pcie-busses aren’t that good, old workstation hardware, 1.5GB/s)
I am doing a lot of Mapping/Unmapping of OpenGL Pixel Buffer objects, which is known to have performance issues. My mainboard only has PCIE 4x support by the way so I am not the fastest one either.
UPDATE: to curb this bottleneck, I am now only mapping the PBO when the texture is going to be rendered to screen (i.e. every 10th frame). In the other frames I am not writing an output texture in my CUDA kernel, but I am still generating an error metric that could be used to select good mutations. I am now getting around 680 FPS.
Please ket me know if performance is now significantly better for you.
Christian
I updated the code in the parent post to have an OpenGL based render path as well.
Use the ‘r’ key to switch renderers. Use ‘m’ to stop mutation in order to compare the output of both renderers. They are not pixel-exact (yet), but that hasn’t been the design goal.
OpenGL renderig seems to be about 3 times as fast as the CUDA counterpart initially, but frame rate drops as the mutant polygons grow bigger - we seem to hit fill rate limitations soon. It has been fun to figure out how to do this properly in OpenGL - fortunately disabling all RGB color range clamping was possible with the ARB_color_buffer_float extension. Before I found out about this, I had to do ping-pong rendering between two floating point buffers with fragment shaders, which was horribly slow.
When enabling the Box Filter on the OpenGL render path, I have to copy the rendered contents of the 32 bit floating point RGBA frame buffer object into a pixel buffer object that CUDA can access. This conversion seems slow and frame rate drops to about 400 for me (even slower than the CUDA renderer with filtering).
The main drawback of the OpenGL renderer is that it doesn’t give me an error metric for the mutation yet. How to do a reduction in a pixel shader? I don’t know. And transferring the image to CUDA for further analysis has some performance bottleneck, as stated above.
Christian
A variation of this technique using transparent circular “blobs” rather than polygons. The initial goal was to emulate an “oil painting” style.
[url=“http://www.m3xbox.com/index.php?page=p_gpupainting”]http://www.m3xbox.com/index.php?page=p_gpupainting[/url]
Just follow the link. This implementation uses GPGPU techniques (not CUDA). The site has screenshots.
Christian
I was thinking about this contest today, and had another idea you might possibly add as a ‘judging factor’:
The original fitness function did a pixel-by-pixel comparison with the generated polygon image; however, this does not account for one of the chief benefits of vector drawing – infinite scaling without loss of resolution. You could add this test to the fitness function by doing some sort of interpolation/scaling on the original image (say, to increase it to 200% size), then multiply your polygons by a scaling vector/matrix to increase their size as well. Then, do the same pixel by pixel comparison. With a few iterations of this on some given scales (for the contest, e.g. 33%, 50%, 125%, 200%, 400%), you could determine a sort of ‘error derivative’ – a function that could tell how good/bad your code matches the scaled picture. Lower values = better.
Another way to do this without the scaling is to take some large (multipixel-size) pictures, where the original represents the largest size, then scale them down with Photoshop/gimp/etc. into the corresponding sizes…this takes away the weird results that might occur from upscaling a very small picture.
More results on “Screaming Duck’s” Blog. He added a blur factor to the polygon and also created a binary format to investigate the kind of compression he could achieve. No use of CUDA though. This is quite an in-depth article tracing his steps.
[url=“Some Stuff - Screaming Duck Software”]Some Stuff - Screaming Duck Software
Not much going on here… Could Mr. Alsing PLEASE publish his source code, or at least a Windows .exe binary? I am a student who may be doing an investigation on evolutionary algorithms soon.
[url=“Google Code Archive - Long-term storage for Google Code Project Hosting.”]Google Code Archive - Long-term storage for Google Code Project Hosting.
This however is a quite old release (that takes several hours to achieve a good image). Roger Alsing’s blog reports about a multiprocessor release that spreads the workload to two CPUs and achieves the same quality in about 5 minutes. I think that’s what the previous poster wanted access to. Hint: the right person to ask would be Roger Alsing, not this forum ;)
I added all of Dan Bystrom’s (http://danbystrom.se) speed suggestions, then I improved it further. http://starcalc.110mb.com/EvoLisa.zip is where you can find it.
[url=“http://digg.com/d1riHp”]http://digg.com/d1riHp[/url]
Here’s an example of someone using a related technique to encode an image into 140 characters allowed by Twitter messages.
It’s just that the result looks anything but convincing yet ;)
I have attached a port of Roger’s program to Direct3D. It achieves 4,000 Generations per Second with 50 5-sided polygons on a GeForce 8600 using the default Mona Lisa image. It is most likely CPU-limited on the same GPU, so 10K+/sec may very well be possible with a modest Quad-Core system (I tested on a single core of such a system). I haven’t implemented the Fitness Function yet, but it will be done soon (in HLSL using Nvidia’s FX Composer). Right now, it simply mutates x random polygons, where x is the number you specify. Setting the “Image Scale” above 1.0 results in slight cosmetic issues on some computers (it pixelates when it is supposed to blur).
Hmm. beats my 600 iterations/second for 127 triangles that I get in CUDA (however I generate the fitness value during rendering).
Just curious, how can you perform a parallel reduction in HLSL? Or are you intending to just generate the difference image (mean square error metric) per pixel and let the CPU generate the final sum?
Christian
Patently obvious…how do you come to this wild conclusion?
Genetic algorithms are simply one class of algorithms for minimizing residual error. One of MANY other classes of algorithms. In fact GA’s are one of the most unfriendly to program and often quite inefficient compared to other methods for the majority of problems, and the theory about how to best choose mutation operators is very under developed leading to a great deal of uncertainty in what is the best way to use them.
So I’m quite curious…how you came to the conclusion that GA’s are better than other algorithms which can be used to perform the same task…such as simulated annealing, belief propagation, graph cuts, monte carlo sampling, levenberg marquardt, branch and bound, mean field annealing, particle swarm optimization, evolutionary search, etc…
IT IS FINISHED!!! (well, at least the alpha version is; there is still no file format for it) It gets around 1,200 FPS on a GeForce 8600 GT. If there was any way to get rid of all the state changes at every stage of the reduction (it even re-initializes samplers which aren’t even used for that section of the code according to PIX), that would be appreciated. I have attached the source (There is a binary in the bin/Release folder). It is released under the General Public License, simply meaning that all applications utilizing any part of this source code must reveal their source code, as well.
A nice video of the “Evolisa” algorithm in action is found here:
[url=“http://www.brianlow.com/index.php/2009/01/26/evolisa-video/”]http://www.brianlow.com/index.php/2009/01/26/evolisa-video/[/url]
And here is the Sydney opera house rendered in SVG, as polygons generated by this algorithm. Worked fine in Firefox. [url=“http://www.conceptdevelopment.net/Wpf/EvoLisaViewer/operahouse_day.svg.xml”]http://www.conceptdevelopment.net/Wpf/EvoL...use_day.svg.xml[/url]
Now can we get this thread to over 30000 views or what.
Christian
Student project:
Evolisa done in PyCuda
http://cs264.org/projects/web/Ding_Yiyang/ding-robb/index.html
Student project:
Evolisa done in PyCuda
http://cs264.org/projects/web/Ding_Yiyang/ding-robb/index.html
There’s a big thread going on at StackOverflow about this algorithm: unicode - Twitter image encoding challenge - Stack Overflow