Different approaches to rasterization for large data sets

luka.bluemler · February 27, 2024, 6:04pm

I’ve an unstructured array of vertices (3D vertices as float4), that I want to sort in a uniform grid. As I’m new to the topic I was searching the web for viable approaches, but found many conflicting solutions with big variance in code complexity. I’m hoping to find someone to give me some guidance for what approaches are ill advised, have apparent bottlenecks or where to find documentation on how to implement them (maybe within the CUDA samples or links/papers/books). Feel free to add any additional approaches you may have.

Here are the approaches I’m currently considering:

Using index structure (basically a lookup table) to access 1D array of vertices
(current implementation)

map coordinates to 3D position of the correct cell within the grid
flatten 3D grid position to 1D
create an array of structs with reference data {lookup}: each struct contains a start position and a number of vertices
copy the two arrays to the GPU
on GPU: access vertices by checking current Cell, then looking up the range, then accessing the 1D array of vertices for the range

This is the first and in my opinion easiest solution, with some hiccups, mainly random/unpredictable accesses, bad readability.

Using an array of structs
I was considering to substitute the lookup table by creating an array of structs for the grid.
Each struct would have an integer storing the number of vertices and a dynamic array with the actual vertices. Instead of searching the lookup table one could (hopefully) use the gridcell as position within that array. From what I gathered online an array of structs with dynamic memory should be possible, but I doubt it as I haven’t found any similar solutions.
Plus: Easier to read, no need for a separate lookup table, close vertices in sequential memory
Minus: Behaviour with empty cells, memory allocation more complicated
Using cuda Arrays
This is where I found a multitude of different answers.

Would a (flattened) 1D cuda array be any different than my 1st approach?
Would a 3D cuda array be more suitable considering the 3D nature of my data?
Do I have to use 1D/3D textures in combination with cuda arrays or would they only help with performance?
Is there a difference in maximum allowed memory between a cuda array and a texture?
how can I access parts (a gridcell) within a 3D cuda array?
(sorting for 3D cuda array in approach 4)

using space fitting curves
This was mentioned in another CUDA forum with barely any more information. The idea would be to use an algorithm to map the vertices’ positions to a 3D cuda array by following this curve, for example using a hilbert curve.
Assuming that I don’t care about the order of vertices within a given gridcell:

Would this approach still offer a potential speedup?
Would I have to set conditions to the grid dimensions for the curve to work?
What level of complexity (approx.) would I have to anticipate?

I hope this topic isn’t completely out of place here; otherwise feel free to close it and point me to the right forum / platform.

striker159 · February 27, 2024, 7:23pm

So, you have a 3D grid and need to find out which vertices belong to which voxel in the grid? Did I understand your description correctly?

luka.bluemler · February 27, 2024, 7:49pm

Not quite. I do know how to assign a voxel given the coordinates ( I do that in the 1st approach I mentioned). My problem is how to structure the data, to gain better performance.
To break it down I have the following steps:
a) load data from file
b) determine voxels of data points (the point you mentioned)
c) pack the data in a structure (retaining the voxel information)
d) send the data to the GPU
a & b are clear to me. My problem is with c & d, meaning how to structure and send the data to the GPU, so that I get better performance.
The approaches I mentioned in the original post should (theoretically) use different methods of structuring the data resulting in higher performance (read/copy times).

striker159 · February 27, 2024, 8:17pm

It is not clear to me what you mean by “to gain better performance”. Getting better performance where?

For starters, I would load the data to the gpu after step a).
Then sort the list of vertices by voxel ids on the gpu.

luka.bluemler · February 27, 2024, 8:58pm

Yes, exactly; sorting on the GPU would be faster. (That’s what I meant with my ill advised approaches).
The point where I’m not sure is what to use to sort the data. This data (once sorted) is constant for me and used in all following calculations. The idea is to store and access the data (GPU side) most elegantly, so that I can use the highest possible problem size (amount of vertices) for the calculation. From what I read textures are the thing to go to for that; consequently I thought about storing that data in a texture. From here on I’m confused:
I get how to read and write from basic textures; but what I don’t find online is an explanation what texture should be used in this case (1D … 3D), if that would even yield faster reads or if it has any impact on the data size I can use.

Topic		Replies	Views
Filtering of Volumetric datasets CUDA Beginner asking for design advice CUDA Programming and Performance	5	3931	September 17, 2009
Accelerating a Gridder CUDA Programming and Performance	12	3970	August 22, 2008
Why there is no cudaBindTexture3D? It would be nice to have this ... CUDA Programming and Performance	9	4667	December 12, 2009
3D grids CUDA Programming and Performance	13	9990	June 29, 2009
Random memory access and += Advice needed CUDA Programming and Performance	4	2412	August 17, 2008
Fast texture fetching An algorithm for fast texture fetching CUDA Programming and Performance	12	14989	December 19, 2013
Point Sorting into Cells CUDA Programming and Performance	9	3336	June 23, 2009
Help with some CUDA concepts CUDA Programming and Performance	7	1448	August 16, 2009
cuda array CUDA Programming and Performance	7	8873	October 15, 2008
Range search in cuda? CUDA Programming and Performance	12	6560	June 23, 2009

Different approaches to rasterization for large data sets

Related topics