After increasing numParticles to 262144 they simply disappear in a few frames. It looks like some problem with RadixSort.
Does anybody know what is necessary to modify for proper working for large number of particles?
Any idea?
EDIT:
If USE_SORT compiler variable (declared in particles_kernel.cuh) is initialized to 0 then RadixSort is not calculated and particles do not disappear. That leads to conclusion the problem is not related to memory resources or GL resources but has to do something with RadixSort. Of course, in that case inter-particles collision forces are not applied, only forces from 3D cells grid. Because RadixSort is necessary for such calculation what should be modified to allow it works correct for numParticles>128000?
Hash is performed in two arrays (one for position and one for velocity) and both arrays are arrays of uint2 meaning that values are 32bits and should handle even larger indexes.
Of course, if you increase numParticles to large value you need to increase and gridDim accordingly to allow all particles fit in the cube. (both are initialized in particles.cpp inside main function) Program is written to dynamically adjust particle radius depending on those two values. I spent lot of hours learning CUDA from ithis example but now I am stuck on modifying RadixSort to work correctly for larger number of particles.
Tesla C870 doesn’t support atomic operations so I must run particles example using sort. But it seems the implemented RadixSort algorithm has some limitation and I can not find what should be modified to allow it works correctly for larger number of particles.
I run example without sort just to be sure that problem of disappearing particles is not related to any other function in code. Off course, in that case program does not work correct simulation (as expected) but all particles are visible and there is no disappearing. That way I conclude the problem of disappearing is related to RadixSort but I can not find what to modify in it to allow it works correctly for larger number of particles.
What grid size are you using? if numParticles/TotalNumberOfcells is greater than the maximum occupancy of a cell then I guess that would cause problems.
I suggest you alter the code to print out the sorted cell ids and check they really are sorted correctly (or incorrectly).
now it works perfectly, 8M particles in grid dimension of 256 it gives 0.8 FPS.
Beside Tesla C870 I have 9800GX2 in the same machine and devicequery reports it is 1.1 core version. So, when same particle.exe runs on 9800GX2 (doesn’t matter which GPU) it works fine with USE_SORT. But if sort is turned off it doesn’t work. Actually it work same as Tesla, like it is 1.0 core version. Why?