OPTIX, acceleration structure requires too much space

Dear all, greetings.

In the application I am developing, the command
" optixAccelComputeMemoryUsage(context, &accel_options, &triangle_input, 1, , &gas_buffer_sizes) "

shows that there have to be allocated about 6.5GBs in the device for the construction of the acceleration structure alone. In other words: gas_buffer_sizes.tempSizeInBytes + gas_buffer_sizes.outputSizeInBytes is about 6.5GBs.

This is too much, considering the fact that the total GPU memory is 4GBs. I am wondering whether there is a way to reduce this amount of memory. I am aware of the build option OPTIX_BUILD_FLAG_ALLOW_COMPACTION, but it seems that it reduces the amount of memory of the acceleration structure after it has been generated, that is, after it has already consumed 6.5GBs.

Any ideas, comments?

Hi @foteinos, welcome!

The main strategies you can use to achieve your goal are:
1- Break your mesh into multiple pieces, and then build and compact each piece separately. (Use instancing to render the pieces together.)
2- Use less geometry, by culling parts you don’t need, or by using mesh reduction/simplification, or some other mechanism. (I realize this might not be a realistic option.)

How many triangles are in your mesh? Does the size seem like too much for the geometry, or is the main problem figuring out how to render a large geometry on a 4GB GPU?

There are some factors that can affect your memory consumption. You may need extra memory to support OPTIX_BUILD_FLAG_ALLOW_RANDOM_VERTEX_ACCESS. You will need extra memory to support motion blur. If you’re using motion blur with more than 2 keyframes, you can use fewer keyframes to reduce memory, at the cost of a potentially lower quality blur. There might also be some memory savings if you were to use OPTIX_GEOMETRY_FLAG_REQUIRE_SINGLE_ANYHIT_CALL.

The memory usage after compaction is sometimes as much as 2x smaller, but not always. If that is true, then you might be able to fit this mesh into memory with 6.5 / 2 = ~3.25GB, but it’s also possible that the compacted size is close enough to, or greater than, 4GB, which means you won’t be able to render it easily. To state the obvious, of course, the easiest option may be to figure out how to get your hands on a GPU with 8GB or more.

There are also more complicated strategies for rendering when geometry doesn’t fit completely in memory. (I am thinking of this for example: GPU-Motunui · Render blog) However, any such strategy will still require breaking large meshes into individual pieces that can fit on the GPU.


Thanks a lot of the prompt reply!

I am not using GPU for rendering. I am using it in order to perform ray shooting from a set of points my non-gpu code computes. In other words, I am using GPU in order to find out the closest mesh triangle that a ray intersects.

Strategy (2) is not allowed as you have already guessed.

When it comes to strategy (1), allow me to ask you: will ray shooting give the same results if I split the mesh into multiple pieces? In other words, will the “instancing” technique combine the pieces together in a way such that the ray shooting is not performed on each separate piece, but always the closest hit among all pieces is computed?

Yes, splitting the mesh and using instancing will produce in most cases exactly the same results as using the combined mesh, with one very small caveat. As long as you’re using an index buffer for your mesh (as opposed to passing vertex triplets to OptiX without an index buffer) then your triangles will be “water-tight” meaning that the rare rays that strike an edge or vertex precisely will be guaranteed to choose one of the associated triangles, and not accidentally miss and pass through the mesh. This does not happen very often, and normally is not a large concern, but if you split meshes then it does open the possibility that rays can occasionally sneak through the seams in between the different pieces of the mesh. In practice this is not something you’re likely to see, but if you are doing high precision calculations and need an absolute guarantee that all rays will hit the surface, then it is worth considering.

You normally can split the mesh any way you like, it doesn’t matter which piece each triangle goes into - for correctness - but there is a performance implication. If you were to put triangles randomly into different groups, then the groups would have significant spatial overlap. If you choose groups of triangles that are near each other in clusters, then your mesh groups will be spatially distinct, and as a result your ray-casting will be faster.

You should be able to use a shared vertex buffer with all the vertices, and only need a unique index buffer for each mesh piece. As long as your vertex buffer isn’t the primary memory hog, this should help keep the memory usage to a minimum. If you were to split the vertices then you’ll have some amount of vertex duplication along the edges.

BTW by “rendering” I just mean the closest-hit & output from the OptiX phase of your application. You could still use an iterative approach like the example I linked to, even if you are “shading” or otherwise processing the closest-hit outputs outside of OptiX. Everything I mentioned so far is agnostic to your needs, and could be considered an implementation detail.


1 Like

David, thank you, I really appreciate your taking time to clarify!

1 Like

Hello again.

I decided to manually split the input into chunks until they can fit each chunk separately in gpu. This way I have more control as to how the input is split.

Suppose that I ended up splitting the input into 4 chunks. Every time I need to shoot X number of arrays, I switch to gpu and perform the shootiing for each of the 4 chunks, combining the results of course. So far so good.

The problem is that at the 20th shooting of X number of rays, shooting on the first chunk gives me the exception: “OPTIX_ERROR_LAUNCH_FAILURE: Optix call 'optixLaunch…”.

The funny part is that when I set the exception flag to the value “OPTIX_EXCEPTION_FLAG_DEBUG | OPTIX_EXCEPTION_FLAG_TRACE_DEPTH | OPTIX_EXCEPTION_FLAG_STACK_OVERFLOW”, it all goes smoothly without a failure;it terminates with the correct result.

Any idea on how to proceed? How do I debug the OPTIX_ERROR_LAUNCH_FAILURE exception? I would expect that using the exception flag above would be more helpful, but in fact, it does not catch anything.

If a launch failure vanishes with different debug flags, that could be a defect inside OptiX’ acceleration structure traversal and since that implementation lives inside the display drivers, requires driver updates to solve.

If you’re not running on the newest released display drivers, please try updating the drivers first.
If the error is not solved by newer display drivers, please provide a minimal and complete reproducer in failing state to be able to investigate the issue.

When reporting OptiX issues, please always provide the following system configuration information:
OS version, installed GPU(s), VRAM amount, display driver version, OptiX (major.minor.micro) version, CUDA toolkit version (major.minor) used to generate the input PTX, host compiler version.

Thanks a lot droettger for the reply.

It seems that the error is related to the tdr timeout. On some chunks, the rays become too long. Trimming them a bit solved the error. So, i am good, optix is awesome!

Perhaps it might interest you the fact that the exception message was confusing and the fact that when the debug flag is activated I get no errors at all. FYI.

Thanks again.

os: windows10
gpu: quadro m2200 (notebook), 4GB
nvidia driver: 511.79 (latest)

Yup, that makes sense on that four generations old Maxwell laptop GPU. Be careful with the workload per launch on that.

If you’re running R510 drivers, then you can also upgrade to the OptiX 7.4.0 SDK.
There have been some API changes in the versions after 7.1.0. Have a look into the OptiX Release Notes describing these. (Link always below the respective OptiX version’s download button.)

In case the SDK still generates PTX code for the Pascal SM 6.0 target by default, change it to SM 5.0 for Maxwell: