Constructing uniform grid on the GPU for ray tracing

tecknoize · July 14, 2013, 6:04pm

I’m working on a simple ray tracer using CUDA and I’ve come across this paper detailing an algorithm to construct an uniform grid on the GPU. I’ve done half of it, but now I’m stuck, and there’s no reference code to inspect.

Basically, what the algorithm does is this :

For each primitive, count the number of cells it overlaps.
Accumulate the values in Step 1 to compute bueffer start indices. These are used in Step 3 to write several bueffers in parallel.
Write all cells each primitive overlaps using pairs (cell ID, primitive ID).
Sort the pairs in Step 3 according to their cell IDs.
For each cell ID, perform a binary search in the sorted grid data to nd bueer start indices and their sizes.

I’m stuck on step 3, especially the cell ID, which is explained like this :

The cell ID value, on the other hand, can be computed by using a local oeffset: the difference between the memory position in the resulting grid and the start position of the primitive list (the lower bound value). This local offeset indicates the nth overlapped cell ID must be written in the nth position in the primitive list. We establish a convention of ordering grid cells in ascending linear IDs in X, Y and then Z order. This allows for converting linear cell IDs to 3D cell IDs.

I’m a bit clueless on what this means. From what I understand, it’s impossible to just calculate the cell ID. Maybe I need to hittest again with my primitive?

I’ve contacted the authors, but this paper is relatively old (2009) and there’s a chance they do not respond.

Can anyone help me understand this step?

SPWorley · July 14, 2013, 10:09pm

This is just a little awkward writing for a simple idea. You have a cell identified by its X Y and Z integer coordinates. But you are building cell/primitive pairs for sorting, so you don’t really want X Y and Z, you want a single integer instead. So they convert the XYZ triple into a single integer ID. They’re awkwardly describing this mapping in words instead of math.

So it all just means if you have a cell with integer coordinates X Y Z, and they form a grid of size XW, YW, ZW, then:

Linear cell ID = X+XW*(Y+YW*(Z));

This gives you a 1:1 mapping of 3D cell ID triples to a 1D list.

So imagine you have a 3D primative that spans grids X=14…15 Y=3 Z=0…1 . That means it spans 212=4 different cells. Those cells are triples (14, 3, 0) (14, 3, 1) (15, 3, 0), and (15, 3, 1).
Let’s say our grid ranges from 0 to 99, so XW=YW=ZW=100. Then you can change those four triples into four cell ids: 314, 10314, 315, and 10315.

These form the first element of your 4 pairs in step 3, with your primitive ID being the second element. You sort that list, and all the primitives in any particular cell are now sequential in that compact sorted list, and a binary search can find the starting index and number of any cell’s population.

tecknoize · July 15, 2013, 12:38am

Thanks for your reply. However, I think I got that part right. What I don’t understand is that the paper imply that you can generate the cell id for each cell-primitive pair by only looking at the data generated in step 2. Correct me if I’m wrong but I think this is impossible. What I’m doing right now is :

Launch a kernel for each primitive-cell pair
In the kernel
2.1 Calculate the primitive id as explained in the paper
2.2 Compute (again, because this is done in step 1 of the algorithm) which cells this primitive overlap
2.3 Out of that list of cells, I calculate an offset (as explained in the paper) based on my thread id to select the correct cell id for this cell-primitive pair.

It works well, but I would like to know if there’s a way to not duplicate the work of testing which cells a primitive overlap.

Topic		Replies	Views
Accelerating a Gridder CUDA Programming and Performance	12	3973	August 22, 2008
Calculate GLOBAL thread Id CUDA Programming and Performance	8	34610	September 5, 2023
3D grids CUDA Programming and Performance	13	10029	June 29, 2009
Different approaches to rasterization for large data sets CUDA Programming and Performance	4	214	February 27, 2024
kd-tree construction with CUDA CUDA Programming and Performance	5	5613	September 21, 2008
Point Sorting into Cells CUDA Programming and Performance	9	3350	June 23, 2009
Synchronization across all threads CUDA Programming and Performance	9	6601	August 22, 2008
Guidance on Parallelizing an Algorithm CUDA Programming and Performance	0	14363	June 30, 2011
Uniform Spatial Subdivision CUDA Programming and Performance	1	855	December 1, 2013
Cuda generating combinations. CUDA Programming and Performance	6	7080	November 8, 2007

Constructing uniform grid on the GPU for ray tracing

Related topics