- Is hardware acceleration(intersection and traversal) works without using attribute program (that is not “must” as documentation says)?
Hardware BVH traversal and triangle intersection are independent of the presence of an attribute program.
The attribute program domain is there to separate state calculations from the intersection program to be able to use the same closesthit program code with different intersection routines. That became necessary because the hardware (HW) triangle intersection only reports barycentric coordinates and you need access to the primitive ID as well and handle the attribute calculation per primitive type.
I would generally recommend to use attribute programs in OptiX 6 because that will automatically defer the vertex attribute calculations to the latest possible time (inside the anyhit or closesthit programs). There are optimizations in OptiX which try to do that automatically in some cases before there were attribute programs
It’s more expensive to calculate these inside custom intersection programs because they are called more often.
In OptiX 7 you need to do that manually inside the anyhit or closesthit anyway and need to use the optixGetHitKind() to distinguish what primitive type was actually intersected if there are multiple intersection programs.
- What acceleration structure types can i use to get intersection and traversal hardware acc.?
Under OptiX 6 use Trbvh. Then BVH traveral will always be used on RTX boards for any primitive type.
(I’m actually not sure if the Bvh and Sbvh aren’t mapped to the HW BVH traversal as well.)
To get HW triangle intersection you need to to use the built-in GeometryTriangles primitives. That’s all.
In OptiX 7 that is similar. For custom primitives you provide an array of AABB to the AS builder. There exists no bounding box program, you need to handle that. For built-in triangles you only need to provide the triangles array data and optional indices for the topology. There are no BVH builder types to select at all. OptiX picks the one required for the underlying HW and scene structure.
- Is there anything that can cause intersection and traversal acceleration not to work if i use GeometryTriangles on RTX card?
You cannot actually switch off HW BVH traversal on RTX cards. The two-level AS hierarchy is fully supported in HW.
GeometryTriangles will always result in HW triangle intersections.
Scene hierarchies deeper than two AS from root to geometry as well as motion blur will slightly change the BVH traversal to handle the additional traversal calculations.
- Will acceleration(intersection and traversal) be working when scene uses GeometryTriangles and other types of geometry in the same time? May be it will be working only for geometry of GeometryTriangles type in this case?
Yes, BVH traversal will always be HW accelerated on RTX boards.
You cannot mix GeometryTriangles and custom primitives in a single AS, so the GeometryTriangles will be running the built-in HW intersection, for the custom primitives the HW BVH traversal will callback to your intersection routine which is run on the streaming multiprocessors (SM). Non-RTX boards will simply run everything on the SMs like it always happened in previous OptiX versions before there was RTX hardware .
- Is anything i should know know to be sure both intersection and traversal acceleration is working?
Just use GeometryTriangles and you should be good.
- Is there a way to check RTX cores workload?
Not really, you can’t change it anyway.
You can use Nsight Compute to profile the streaming multiprocessor parts of your application, effectively all OptiX programs you provided. Make sure you compiled your PTX code with --generate-line-info to be able to map instructions to the CUDA input code.
In OptiX 6 there is also a usage report functionality which gives you some coarse information about what happens under the hood. That doesn’t exist in OptiX 7 because there everything is explicit.
For overall application performance analysis, use Nsight Systems first, for example, to see if there are unnecessary overheads when copying data around etc.
To get visual feedback of what takes time in your scene it’s recommended to implement a “time view” feature to your renderer.
Something like this (search the whole repository for USE_TIME_VIEW):
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo3/shaders/raygeneration.cu#L169
For people reading this using DXR as well, check this out:
https://devblogs.nvidia.com/profiling-dxr-shaders-with-timer-instrumentation/
Also see these related topics:
https://forums.developer.nvidia.com/t/optix-6-0-rtx-acceleration-is-supported-on-maxwell-and-newer-gpus/70206
https://forums.developer.nvidia.com/t/leveraging-rtx-hardware-capabilities-with-optix-7-0/107733
There is a SIGGRAPH 2019 OptiX 7 Performance Tools and Tricks presentation linked in this post:
https://forums.developer.nvidia.com/t/optix-talks-from-siggraph-2019/80866