Constructing uniform grid on the GPU for ray tracing

I’m working on a simple ray tracer using CUDA and I’ve come across this paper detailing an algorithm to construct an uniform grid on the GPU. I’ve done half of it, but now I’m stuck, and there’s no reference code to inspect.

Basically, what the algorithm does is this :

  • For each primitive, count the number of cells it overlaps.
  • Accumulate the values in Step 1 to compute bueffer start indices. These are used in Step 3 to write several bueffers in parallel.
  • Write all cells each primitive overlaps using pairs (cell ID, primitive ID).
  • Sort the pairs in Step 3 according to their cell IDs.
  • For each cell ID, perform a binary search in the sorted grid data to nd bueer start indices and their sizes.

I’m stuck on step 3, especially the cell ID, which is explained like this :

The cell ID value, on the other hand, can be computed by using a local oeffset: the difference between the memory position in the resulting grid and the start position of the primitive list (the lower bound value). This local offeset indicates the nth overlapped cell ID must be written in the nth position in the primitive list. We establish a convention of ordering grid cells in ascending linear IDs in X, Y and then Z order. This allows for converting linear cell IDs to 3D cell IDs.

I’m a bit clueless on what this means. From what I understand, it’s impossible to just calculate the cell ID. Maybe I need to hittest again with my primitive?

I’ve contacted the authors, but this paper is relatively old (2009) and there’s a chance they do not respond.

Can anyone help me understand this step?

This is just a little awkward writing for a simple idea. You have a cell identified by its X Y and Z integer coordinates. But you are building cell/primitive pairs for sorting, so you don’t really want X Y and Z, you want a single integer instead. So they convert the XYZ triple into a single integer ID. They’re awkwardly describing this mapping in words instead of math.

So it all just means if you have a cell with integer coordinates X Y Z, and they form a grid of size XW, YW, ZW, then:

Linear cell ID = X+XW*(Y+YW*(Z));

This gives you a 1:1 mapping of 3D cell ID triples to a 1D list.

So imagine you have a 3D primative that spans grids X=14…15 Y=3 Z=0…1 . That means it spans 212=4 different cells. Those cells are triples (14, 3, 0) (14, 3, 1) (15, 3, 0), and (15, 3, 1).
Let’s say our grid ranges from 0 to 99, so XW=YW=ZW=100. Then you can change those four triples into four cell ids: 314, 10314, 315, and 10315.

These form the first element of your 4 pairs in step 3, with your primitive ID being the second element. You sort that list, and all the primitives in any particular cell are now sequential in that compact sorted list, and a binary search can find the starting index and number of any cell’s population.

Thanks for your reply. However, I think I got that part right. What I don’t understand is that the paper imply that you can generate the cell id for each cell-primitive pair by only looking at the data generated in step 2. Correct me if I’m wrong but I think this is impossible. What I’m doing right now is :

  1. Launch a kernel for each primitive-cell pair
  2. In the kernel
    2.1 Calculate the primitive id as explained in the paper
    2.2 Compute (again, because this is done in step 1 of the algorithm) which cells this primitive overlap
    2.3 Out of that list of cells, I calculate an offset (as explained in the paper) based on my thread id to select the correct cell id for this cell-primitive pair.

It works well, but I would like to know if there’s a way to not duplicate the work of testing which cells a primitive overlap.