Memory consumption relation on GAS?

m001 · June 13, 2023, 4:49am

Is there also such a clear answer possible related to memory consumption of AS’s ?
I’m currently thinking of adding optional support for “multiple materials per 3D object” in a path tracer: (OptiX 7.6, device driver 531.79, GTX 1050 2GB, Win10Pro)

My current IAS → GAS handling is:
splitting objects (which have multiple materials) into “One Material Subsets”, so a subset then only contains one material and each subset has its own GAS associated.

Would implementing “objects using multiple materials” reduce the memory usage for that one GAS in general, if for n GASes, only one GAS with the combined geometry data of all n GASes would be used?
(Theoretical answer for current BVH implementations expected.)

Thank you!

droettger · June 13, 2023, 2:12pm

This recent thread and the links in there answer some of the AS size questions and options to influence that.
https://forums.developer.nvidia.com/t/how-to-speed-up-the-traversal-speed-of-bvh/256481/2

The main issue with bigger GAS is that building them needs a lot of temporary memory for the initial build and you probably won’t get very far with the 2GB on your Pascal board. Also compaction is much more efficient on RTX boards. Your mileage may vary.

I haven’t measured it but I would expect that many smaller GAS require more memory than one big one, before and after compaction, so merging them is a reasonable idea when possible. These can then also be instanced as a whole.

Mind the prefix sum necessary for the SBT offsets on the instances when using more than one SBT record in a GAS. The SBT index calculation and the specific example tables in this OptiX Programming Guide chapter explain that, specifically the last in section 7.3.5:
https://raytracing-docs.nvidia.com/optix7/guide/index.html#shader_binding_table#accelstruct-sbt

Having multiple materials inside one GAS means numSbtRecords > 1 and then you need the
sbtIndexOffsetBuffer data inside the AS build input to define which geometric primitive uses what SBT offset (== material) and that would need to be stored inside the AS. I never checked if there is a size difference. I’m normally using numSbtRecords == 1 because I used the instance SBT offset and user defined instance ID to select shaders and material parameters.

This is one of the cases where you need to simply benchmark that on your specific system configuration.

m001 · June 13, 2023, 11:02pm

Thank you for your answer.
I always use compaction and all the build flags discussed in the links to the given threads are always the same. As you presented in your OptiX Apps I also use numSbtRecords == 1 but yet with the difference to have the (MDL)-material-related stuff within the SBT record, which does not allow a face-based material index. So I will move that out of there and obviously need to do some benchmarking as you said.

But obviously to me it seems you also think, that

as I assumed.

But additionally I hoped for a guess on the AABB structure when splitting geometry.
You said fewer overlaps lead to higher speed and so I thought there is also a memory downside with that.

What I found from the links so far is:

When I understand that correctly the number of AABB structures for primitives remains the same in both cases (one GAS or multiple GASes).
But the hierarchy over them is build separately (in case of multiple GASes) and would be shared on one big GAS; So that seems to cause some overhead when splitting into several subsets.

dhart · June 20, 2023, 4:23pm

Hi @m001,

The primary reason multiple small GAS require more memory than a single large one is because each GAS has a little bit of overhead in the form of a header. The overhead is negligible for a large mesh GAS with thousands or millions of triangles, but it might become noticeable if you build a GAS over a very small mesh. For example, you will see virtually no practical memory savings if you combine 3 meshes of 10k triangles each into a single mesh. On the other hand, you will see a relatively larger memory savings if you were to combine 10k GASes of 3 triangles each all into a single GAS with 30k triangles.

I hoped for a guess on the AABB structure when splitting geometry.

The overhead traversing through separate overlapping hierarchies is true for any hierarchy type, it is not related to AABBs or to any OptiX implementation details. it is purely a function of hierarchical tree search being a logarithmic operation.

Here’s an example of the worst case. Imagine you have a GAS with a decent amount of geometry in it, say a million triangles. Suppose you want to use it as an instance, and place multiple copies of the instance in your scene. Now consider the case when you place two of these instances in almost the same position, so they overlap almost completely. In this case, a ray that passes through these two instances will need to traverse each one separately. If instead you can merge these two overlapping instances into a single GAS, then the ray can traverse the single GAS one time.

Let’s use a hypothetical binary KD tree. For 1 million triangles, in the ideal case you might expect to traverse about 20 nodes in the tree on average for a ray that hit a triangle, because 2^20 ~= 1M. If you have 2 instances overlapping and you have to traverse them independently, then you will expect to search 20+20 = 40 nodes. If instead you took these two overlapping instances and built a flattened single hierarchy with 2 million triangles, then you can traverse this combined hierarchy by examining 21 nodes. So your traversal of the overlapped instances takes approximately twice as long as the traversal of the GAS built with combined meshes.

You said fewer overlaps lead to higher speed and so I thought there is also a memory downside with that.

Note the memory overhead of many small GASes, and the compute overhead of traversing overlapping GASes are completely independent and separate problems. But, there is a single solution that happens to solve both problems at the same time. When you combine meshes into larger meshes, you reduce both the GAS memory overheads and the traversal overlap overheads. There’s no memory downside to merging two different meshes into a single mesh, but there is a memory downside to combining two or more instances of the same mesh into a single one. In that case you’ll use almost twice the memory, because you’ll need to keep two copies of the mesh instead of using two instance nodes. You also lose scene flexibility when combining meshes, for example if you want to animate some of the meshes independently, then merging them naively might lead to excessive BVH build times compared to maintaining separate GASes.

–
David.

system · July 4, 2023, 4:24pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Organizing GASes/IASes for local ray tracing in grid-based scene subdivision OptiX	5	684	March 12, 2023
BVH building algorithm and primitive order OptiX	17	2443	June 14, 2022
Decomposing BVH to accelerate traversal OptiX	1	509	February 26, 2024
Is it possible to call optixTrace from custom intersection? OptiX	15	1169	June 14, 2022
Question about SBTs OptiX	5	1078	June 14, 2022
Ray traversal slowdown with a distant object OptiX	8	326	March 14, 2024
Split Triangle Mesh into Multiple GAS OptiX	5	463	May 13, 2024
OPTIX, acceleration structure requires too much space OptiX	10	2655	June 15, 2022
Running out of memory on host well before device OptiX	16	2900	June 14, 2022
Handling of very large meshes OptiX	3	985	October 26, 2022

Memory consumption relation on GAS?

Related topics