However the fps got twice as slow. So I am wondering what could cause the performance hit?
When speaking about performance, please provide absolute numbers and the system configuration.
In case this drops from 60 fps to 30 fps this could be as simple as having VSync enabled for the final display mechanism.
IAS (one instance per layer) → GAS (geometry of the unique objects)
So I guess comparing to your suggestion I am missing the middle IAS ?
But as there is only one GAS per instance I don’t see the point of having an IAS for it.
Is that correct or I need the middle IAS ?
If you had a single level instancing mechanism before and all geometry per layer in one GAS and use the visibility mask method to encode e maximum of eight layers, there is no need for an additional IAS level.
Could this cause the performance hit?
I wouldn’t expect a performance reduction when using the same hierarchy with and without visibility masks when all layers are enabled.
Previously I had one GAS (with one geometry mesh containing all vertices and triangles indices of the scene) and this inside the top root IAS.
If everything was in one GAS, why do you need an IAS on top?
You did not use OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_GAS?
Ok, that could actually be a difference, but a factor of two sounds rather high.
The OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_LEVEL_INSTANCING hierarchy is fully hardware accelerated on RTX boards.
What is your system configuration?
(OS version, installed GPUs, display driver version, OptiX version (major.minor.micro), CUDA version, host compiler version.)
The fps for this was almost twice as fast. In the new approach I have split the main vertex buffer and triangle indices in to multiple buffers for different layers. That is each GAS now takes its own vertex and triangle indices buffer. And so the total vertices could potentially be more than the first scenario.
Ok, you’re saying that you reused vertex information in the single GAS hierarchy, while you duplicated some shared vertices in the single level instancing case for the layering.
For example in below image, the scene triangles are grouped in three layers colored in red, green and blue. and each layer is a GAS inside an OptixInstance, and so each GAS would have its own vertex array.
I cannot have one vertex buffer for the whole model and share it among multiple GASs ?
Memory management is your responsibility.
You could put all vertices in one buffer if you want and have the individual GAS be built by using the respective primitive indices.
I see no problem with that, other that if the scene becomes really large, this relies on CUDA to find a necessary big enough contiguous memory block to allocate the data.
Or this is not a bottleneck anyway?
Unlikely if the individual objects aren’t built of rather few primitives and the previous sharing was highly effective.
Do you use acceleration structure compaction?
Another thing I am suspicious of is again overlapping IASs. Since layers triangles are scattered all around, the instances bounding boxes will overlap. Obviously, as in the below image, the bounding boxes of these three instances heavily overlap and a ray in almost any direction has to check more ray - instance BB test. Is this correct ? and could this be the cause of performance hit?
Yes, this is going to behave worse compared to a single GAS since that can be better optimized spatially, while the IAS AABBs overlap and would basically all be checked in your current setup.
This is also happening when engines sort their geometries by material which isn’t the best idea for a spatial acceleration structure.
You could for example split your geometries per layer into multiple instances for that case to reduce the AABB size per instance to reduce the overlap.
If you can handle the whole thing with a single level instancing hierarchy, that’s fine, even if there are many more than number of layers instances inside the top-level IAS.
Still, using instances for that layering mechanism is the only reasonable choice if you do not want to rebuild your previous single GAS on each layer toggle, which would be another solution.