I’m currently building a water particle system using smoothed particle hydrodynamics for a game I’m making for a school project, currently I have about 8192 particles running at around 1300 FPS (1x 580 GTX) with a basic metaball pixel shader for a blobbing effect.
I’m currently using Simon Green’s Particle Simulation using CUDA’s grid method for my optimization of the particles to great effect, however my biggest bottle neck is the sort required.
Currently I’m copying the cell data to the CPU and doing a sequential Radix sort, and to increase performance even farther I’m looking at doing a GPU sort.
I noticed the June 2010 DirectX sample has a very similar fluid simulation demo that I took a look at and saw they are using a GPU bitonic sort and matrix transform. I also found Designing Efï¬cient Sorting Algorithms for Manycore GPUs that suggests a radix sort on the GPU.
Does anyone know if a radix sort has been done in a compute shader. I have found some good resources for how to do it in CUDA, however I’m wondering about the differences between the two.
Also is it worth trying to figure it out? or would the bitonic sort from the sample be just as good/better for a lot less effort (as there is both code I can look at and a nice explaintion on Microsofts website)?