I’m fairly new to CUDA but gained some experience by rewriting a particle-based simulation model in CUDA. However, the performance gain
is quite slow (at about 10x compared to the CPU) - basic problem is the particle sorting into a uniform grid which is the major time-consumer at MY simulation as the collision part is quite simple (as opposed to other particle simulations, where the interaction part is the performance-gainer at the GPU).
I tried both sorting algorithms from Simon Greens particles simulation (sdk examples) but
they are quite slow due to the scattered read/write (the atomic algorithm) or the sorting (the sorting based algorithm). Additionally,
I will try and use texture fetches but I don’t expect a huge
performance gain by that.
Any ideas for improvement?