just a little call for ideas. Just for fun I am trying to accelerate fractal rendering with the GPU. The fractals are generated using a random descent into an iterated function system (playing “the chaos game”).
This rendering consists of three phases:
Based on an iterative random process I build a list of screen coordinates and resulting color contributions. Each x/y coordinate is assigned a “bin” identifier (which simply represents a pixel). These screen coordinates are badly scattered due to the random method of their generation
Sort the color contributions by bin (pixel) identifier. I am using the fast radix sort from the SmokeParticles demo.
Color accumulation phase:
For each screen pixel, go through all bin entries that correspond to this pixel, and add the color contributions. To find the first entry for a pixel, I use a fast binary search. Then I write the result into the frame buffer.
Phase 1) is fast and writes the pixel log using fully coalesced memory access.
Phase 2) seems to be the major bottleneck. Consumes 90% of the entire time.
For Phase 3) to be fast, I need the color contributions to be sorted by screen pixel, so I can accumulate them up in shared memory and use coalesced writes for the results. Without doing any sorting, I would have major performance issues because the writes would be scattered all over the frame buffer, I would have to use atomic writes to prevent collisions, etc…
Can you think of any other way, to perform some kind of pre-binning of the pixels in phase 1) already, without breaking memory coalescing during the writes? I would really like to shorten the huge sorting time of phase 2)