Particles example with 256K and more particles Why they disappear?

After increasing numParticles to 262144 they simply disappear in a few frames. It looks like some problem with RadixSort.

Does anybody know what is necessary to modify for proper working for large number of particles?

Any idea?

If USE_SORT compiler variable (declared in particles_kernel.cuh) is initialized to 0 then RadixSort is not calculated and particles do not disappear. That leads to conclusion the problem is not related to memory resources or GL resources but has to do something with RadixSort. Of course, in that case inter-particles collision forces are not applied, only forces from 3D cells grid. Because RadixSort is necessary for such calculation what should be modified to allow it works correct for numParticles>128000?
Hash is performed in two arrays (one for position and one for velocity) and both arrays are arrays of uint2 meaning that values are 32bits and should handle even larger indexes.

Of course, if you increase numParticles to large value you need to increase and gridDim accordingly to allow all particles fit in the cube. (both are initialized in particles.cpp inside main function) Program is written to dynamically adjust particle radius depending on those two values. I spent lot of hours learning CUDA from ithis example but now I am stuck on modifying RadixSort to work correctly for larger number of particles.

I can not believe nobody has tried it.

It seemed to work fine for >2,000,000 particles on our 8800 GTX 768Mb. Which card are you using?

Non sort based particle binning will only work if your card supports atomic updates (compute capability 1.1).

thanks for the answer.

Tesla C870 doesn’t support atomic operations so I must run particles example using sort. But it seems the implemented RadixSort algorithm has some limitation and I can not find what should be modified to allow it works correctly for larger number of particles.

I run example without sort just to be sure that problem of disappearing particles is not related to any other function in code. Off course, in that case program does not work correct simulation (as expected) but all particles are visible and there is no disappearing. That way I conclude the problem of disappearing is related to RadixSort but I can not find what to modify in it to allow it works correctly for larger number of particles.

What grid size are you using? if numParticles/TotalNumberOfcells is greater than the maximum occupancy of a cell then I guess that would cause problems.

I suggest you alter the code to print out the sorted cell ids and check they really are sorted correctly (or incorrectly).

Hmm, it looks like a bug was introduced by one of my esteemed colleagues here (who will remain nameless).

Go to line 195 in “” (the function reorderDataAndFindCellStart) and change:

computeGridSize(numBodies, 256, numThreads, numBlocks);


computeGridSize(numBodies, 256, numBlocks, numThreads);

Thanks for finding this!

Ahh I was using the Cuda 1.1 version not 2.0beta. Apologies for any confusion.


now it works perfectly, 8M particles in grid dimension of 256 it gives 0.8 FPS.

Beside Tesla C870 I have 9800GX2 in the same machine and devicequery reports it is 1.1 core version. So, when same particle.exe runs on 9800GX2 (doesn’t matter which GPU) it works fine with USE_SORT. But if sort is turned off it doesn’t work. Actually it work same as Tesla, like it is 1.0 core version. Why?

Doc explains everything :)

Default compilation param is set to sm_10.

After changing to sm_11 it works fine withouts sort on 9800GX2