building structures for space partition what's faster: CPU or GPU?

Hello everyone,
I have a question about building a structure for space partition.
I’ve read it’s faster to build tree structures on CPU than on GPU.
But what’s about grids? In the CUDA sample code “particles” a grid
structure is completely build on GPU.

My real problem is:
I have a very large array of particles. This number of particles is too big to load all of them on GPU simultaneously.
So I separate this large array in smaller ones.
These small arrays can be then loaded to GPU sequentially for doing calculations.

I want to split the particles in smaller arrays respectively their location in space,
therfore I need the particles ordered in a space partition structure, e.g. a grid,
BEFORE the particles are loaded on GPU.
I thought of two different possibilities:

  1. I do the structure building on CPU for all particles and then split this new, ordered array in smaller ones for GPU. This could be very slow for very many particles, couldn’t it?
  2. I build the structure on GPU by running it sequentially for parts of the large particle array. Every iteration I would have to copy the ordered particles from GPU on CPU. (expensive GPU-CPU transfer??)
    At the end I would have many ordered particle arrays on CPU which I had to
    combine to one large array again. Then I could separate this array in smaller ones
    in respect of the space partition.

Possibility 2 sounds more sophisticated… Which method is faster?
What do you think?

Speaking from experience having implemented particle binning in my code, 1. will definitely be faster. No question.

In my application, I need to spatially bin a small number of particles (up to 200,000). Every single GPU algorithm I have tried, including using atomic operations like the SDK example, has resulted in a SLOWER implementation on the GPU. For me, it is faster to copy positions GPU->CPU, bin them and then copy the cell list back to the GPU. For you, you don’t even need the initial GPU->CPU copy so it should be even faster.

Additionally, if your goal is spatial locality among the particles I can suggest looking into hilbert curves. If you are interested, I can send you a pre-print of a paper where hilbert curves are used to resort particles in memory to improve texture cache hits for Molecular Dyamics on the GPU.

Dear MisterAnderson42,
Could you be so king to send this pre-print for me?

Sure. You can download it from here:

Direct link: