Different approaches to rasterization for large data sets

I’ve an unstructured array of vertices (3D vertices as float4), that I want to sort in a uniform grid. As I’m new to the topic I was searching the web for viable approaches, but found many conflicting solutions with big variance in code complexity. I’m hoping to find someone to give me some guidance for what approaches are ill advised, have apparent bottlenecks or where to find documentation on how to implement them (maybe within the CUDA samples or links/papers/books). Feel free to add any additional approaches you may have.

Here are the approaches I’m currently considering:

  1. Using index structure (basically a lookup table) to access 1D array of vertices
    (current implementation)
  • map coordinates to 3D position of the correct cell within the grid
  • flatten 3D grid position to 1D
  • create an array of structs with reference data {lookup}: each struct contains a start position and a number of vertices
  • copy the two arrays to the GPU
  • on GPU: access vertices by checking current Cell, then looking up the range, then accessing the 1D array of vertices for the range

This is the first and in my opinion easiest solution, with some hiccups, mainly random/unpredictable accesses, bad readability.

  1. Using an array of structs
    I was considering to substitute the lookup table by creating an array of structs for the grid.
    Each struct would have an integer storing the number of vertices and a dynamic array with the actual vertices. Instead of searching the lookup table one could (hopefully) use the gridcell as position within that array. From what I gathered online an array of structs with dynamic memory should be possible, but I doubt it as I haven’t found any similar solutions.
    Plus: Easier to read, no need for a separate lookup table, close vertices in sequential memory
    Minus: Behaviour with empty cells, memory allocation more complicated

  2. Using cuda Arrays
    This is where I found a multitude of different answers.

  • Would a (flattened) 1D cuda array be any different than my 1st approach?
  • Would a 3D cuda array be more suitable considering the 3D nature of my data?
  • Do I have to use 1D/3D textures in combination with cuda arrays or would they only help with performance?
  • Is there a difference in maximum allowed memory between a cuda array and a texture?
  • how can I access parts (a gridcell) within a 3D cuda array?
  • (sorting for 3D cuda array in approach 4)
  1. using space fitting curves
    This was mentioned in another CUDA forum with barely any more information. The idea would be to use an algorithm to map the vertices’ positions to a 3D cuda array by following this curve, for example using a hilbert curve.
    Assuming that I don’t care about the order of vertices within a given gridcell:
  • Would this approach still offer a potential speedup?
  • Would I have to set conditions to the grid dimensions for the curve to work?
  • What level of complexity (approx.) would I have to anticipate?

I hope this topic isn’t completely out of place here; otherwise feel free to close it and point me to the right forum / platform.

So, you have a 3D grid and need to find out which vertices belong to which voxel in the grid? Did I understand your description correctly?

Not quite. I do know how to assign a voxel given the coordinates ( I do that in the 1st approach I mentioned). My problem is how to structure the data, to gain better performance.
To break it down I have the following steps:
a) load data from file
b) determine voxels of data points (the point you mentioned)
c) pack the data in a structure (retaining the voxel information)
d) send the data to the GPU
a & b are clear to me. My problem is with c & d, meaning how to structure and send the data to the GPU, so that I get better performance.
The approaches I mentioned in the original post should (theoretically) use different methods of structuring the data resulting in higher performance (read/copy times).

It is not clear to me what you mean by “to gain better performance”. Getting better performance where?

For starters, I would load the data to the gpu after step a).
Then sort the list of vertices by voxel ids on the gpu.

Yes, exactly; sorting on the GPU would be faster. (That’s what I meant with my ill advised approaches).
The point where I’m not sure is what to use to sort the data. This data (once sorted) is constant for me and used in all following calculations. The idea is to store and access the data (GPU side) most elegantly, so that I can use the highest possible problem size (amount of vertices) for the calculation. From what I read textures are the thing to go to for that; consequently I thought about storing that data in a texture. From here on I’m confused:
I get how to read and write from basic textures; but what I don’t find online is an explanation what texture should be used in this case (1D … 3D), if that would even yield faster reads or if it has any impact on the data size I can use.