OPTIX, acceleration structure requires too much space

Dear all, greetings.

In the application I am developing, the command
" optixAccelComputeMemoryUsage(context, &accel_options, &triangle_input, 1, , &gas_buffer_sizes) "

shows that there have to be allocated about 6.5GBs in the device for the construction of the acceleration structure alone. In other words: gas_buffer_sizes.tempSizeInBytes + gas_buffer_sizes.outputSizeInBytes is about 6.5GBs.

This is too much, considering the fact that the total GPU memory is 4GBs. I am wondering whether there is a way to reduce this amount of memory. I am aware of the build option OPTIX_BUILD_FLAG_ALLOW_COMPACTION, but it seems that it reduces the amount of memory of the acceleration structure after it has been generated, that is, after it has already consumed 6.5GBs.

Any ideas, comments?

Hi @foteinos, welcome!

The main strategies you can use to achieve your goal are:
1- Break your mesh into multiple pieces, and then build and compact each piece separately. (Use instancing to render the pieces together.)
2- Use less geometry, by culling parts you don’t need, or by using mesh reduction/simplification, or some other mechanism. (I realize this might not be a realistic option.)

How many triangles are in your mesh? Does the size seem like too much for the geometry, or is the main problem figuring out how to render a large geometry on a 4GB GPU?

There are some factors that can affect your memory consumption. You may need extra memory to support OPTIX_BUILD_FLAG_ALLOW_RANDOM_VERTEX_ACCESS. You will need extra memory to support motion blur. If you’re using motion blur with more than 2 keyframes, you can use fewer keyframes to reduce memory, at the cost of a potentially lower quality blur. There might also be some memory savings if you were to use OPTIX_GEOMETRY_FLAG_REQUIRE_SINGLE_ANYHIT_CALL.

The memory usage after compaction is sometimes as much as 2x smaller, but not always. If that is true, then you might be able to fit this mesh into memory with 6.5 / 2 = ~3.25GB, but it’s also possible that the compacted size is close enough to, or greater than, 4GB, which means you won’t be able to render it easily. To state the obvious, of course, the easiest option may be to figure out how to get your hands on a GPU with 8GB or more.

There are also more complicated strategies for rendering when geometry doesn’t fit completely in memory. (I am thinking of this for example: GPU-Motunui · Render blog) However, any such strategy will still require breaking large meshes into individual pieces that can fit on the GPU.


Thanks a lot of the prompt reply!

I am not using GPU for rendering. I am using it in order to perform ray shooting from a set of points my non-gpu code computes. In other words, I am using GPU in order to find out the closest mesh triangle that a ray intersects.

Strategy (2) is not allowed as you have already guessed.

When it comes to strategy (1), allow me to ask you: will ray shooting give the same results if I split the mesh into multiple pieces? In other words, will the “instancing” technique combine the pieces together in a way such that the ray shooting is not performed on each separate piece, but always the closest hit among all pieces is computed?

Yes, splitting the mesh and using instancing will produce in most cases exactly the same results as using the combined mesh, with one very small caveat. As long as you’re using an index buffer for your mesh (as opposed to passing vertex triplets to OptiX without an index buffer) then your triangles will be “water-tight” meaning that the rare rays that strike an edge or vertex precisely will be guaranteed to choose one of the associated triangles, and not accidentally miss and pass through the mesh. This does not happen very often, and normally is not a large concern, but if you split meshes then it does open the possibility that rays can occasionally sneak through the seams in between the different pieces of the mesh. In practice this is not something you’re likely to see, but if you are doing high precision calculations and need an absolute guarantee that all rays will hit the surface, then it is worth considering.

You normally can split the mesh any way you like, it doesn’t matter which piece each triangle goes into - for correctness - but there is a performance implication. If you were to put triangles randomly into different groups, then the groups would have significant spatial overlap. If you choose groups of triangles that are near each other in clusters, then your mesh groups will be spatially distinct, and as a result your ray-casting will be faster.

You should be able to use a shared vertex buffer with all the vertices, and only need a unique index buffer for each mesh piece. As long as your vertex buffer isn’t the primary memory hog, this should help keep the memory usage to a minimum. If you were to split the vertices then you’ll have some amount of vertex duplication along the edges.

BTW by “rendering” I just mean the closest-hit & output from the OptiX phase of your application. You could still use an iterative approach like the example I linked to, even if you are “shading” or otherwise processing the closest-hit outputs outside of OptiX. Everything I mentioned so far is agnostic to your needs, and could be considered an implementation detail.


David, thank you, I really appreciate your taking time to clarify!

1 Like