Weird mistake in the particles demo? Is there an algorithm mistake in the particles demo?

Hi!

I’ve not read very much in these forums yet and this is my first post so I beg your pardon if there’s anything wrong with my posting.
My name is Joey and I’m studying physics at the University of Heidelberg.
I’m about to learn programming cuda. I’m pretty much pleased by it’s capabilities, although I’ve not yet got into the very last detail. In parallel to my book GPU Gems 3 I’m reading through the SDK examples right now. But I’ve stumbled across something which makes me wonder whether the developer of the particles demo has mistakenly used an algorithm which is quite unnecessary or if it’s me who is mistaken here. Ok, here I go.

In chapter 32 there’s an algorithm described which builds so-called collision cells, which is an array consisting of a home cell followed by several guest cells. The home cell is the cell of a particle whose origin lays within it, a guest cell just touches the boundaries of a cell. So far, so good. For processing collisions, the collision cells are traversed one by one in order to find possible collision partners. Now, in the particles demo, the programmer goes quite the same way. BUT: He’s not using guest cells. He has only a list of home cells, but still he’s sorting them for later use, building start and end of these collision cells (which actually aren’t, because they just contain one home cell), just to process all the surrounding cells anyway in a later step. Isn’t that unnecessary? Either you use the collision cell way or you check the surrounding cells.

I hope you get what I mean and that you can help me.
Thanks in advance,
Joey.

btw. if you don’t understand any of my English, just ask away, I know I’m far from speaking perfectly.

The sort is used mostly for BINNING. This is the problem of taking particles and putting them each into a contiguous list. Sorting does this for you almost as a side effect. So the sort is not really to make each cell occur in order, it is mostly to make all the particles within one cell come together.

There are many strategies for binning, but sorting is a reasonable and simple way that will work no matter what input you have. Sort is more powerful than binning in general, though.

so they’re just sorted in order to have a list of all particles within a cell? for looking them up afterwards? that’s some half-heartedly approach. but maybe they’re having their reasons to put it that way. thanks anyway.

If you think about the standard bucket sort to do binning as you would do it on a CPU for a bit, you will quickly realize that there is really no good way to do this on the GPU in parallel (race conditions abound). Since sorting can be efficiently done in parallel, and it has the side effect of binning the particles, it is one of the best options available for binning particles on the GPU.

As a side note, the race conditions that show up in the “normal” particle per thread approach to bucket sorting can be avoided with atomic operations. The particles example used to have an alternate implementation of binning that did this. In my experimentation, that is rarely faster than just doing the binning on the CPU.