OptiX 8: optixTrace() vs optixTraverse()+optixInvoke() performance

robert_sulej · April 12, 2024, 10:35am

Hello,

I observe a systematic performance degradation when optixTrace() is simply replaced with optixTraverse(); optixInvoke() calls (no optixReorder() used). The difference gets bigger when more complex and multiple shaders are involved, Is that expected or I should pay attention to something else when trying to implement the new approach?

The only change I am making in the code is:

	optixTrace(
		handle,
		ray_origin,
		ray_direction,
		tmin,
		tmax,
		0.0f,
		OptixVisibilityMask(1),
		OPTIX_RAY_FLAG_NONE,
		RAY_TYPE_RADIANCE,
		RAY_TYPE_COUNT,
		RAY_TYPE_RADIANCE,
		u0, u1, g0, g1);

replaced with:

	optixTraverse(
		handle,
		ray_origin,
		ray_direction,
		tmin,
		tmax,
		0.0f,
		OptixVisibilityMask(1),
		OPTIX_RAY_FLAG_NONE,
		RAY_TYPE_RADIANCE,
		RAY_TYPE_COUNT,
		RAY_TYPE_RADIANCE,
		u0, u1, g0, g1);
	optixInvoke(u0, u1, g0, g1);

Code is running on rtx4090 and driver 552.12.

droettger · April 12, 2024, 11:13am

Could you quantify that performance degradation with absolute numbers please?
What differences in performance with how many shaders of what complexity are we talking about?

I shortly tested that with my MDL_renderer example and the scene description scene_mdl_vMaterials.txt and I see a small difference between optixTrace and optixTraverse/Invoke inside the non-SER code path of the integrator which is like 113.25 vs. 112.6 samples per seconds on an RTX 6000 Ada running Widows 10 and 545.84 drivers.
(Same as used here: https://forums.developer.nvidia.com/t/optix-advanced-samples-on-github/48410/14 )
I’m not using a lot of different hit records. That renderer is configuring materials with a few similar hit programs and the rest is done with direct callable programs.

So I would expect a minor difference, maybe due to some hit object data handling.

I would need to test R550 drivers next week if you see anything worse.

Have you checked with Nsight Compute if there is any obvious difference in behavior between the two modes?

If you’re not using optixReorder, there is not much incentive to replace optixTrace against optixTraverse/Involke just because you can.
There are some methods where optixTraverse can be used instead of optixTrace, like the quick shadow ray test shown inside the optixPathTracer example which also does SER.

robert_sulej · April 12, 2024, 1:08pm

OK… I think I have found the reason. In the more complex scenes I am also casting occlusion rays, and these were done using optixTrace() also when optixTraverse(); optixInvoke() was used for radiance. Mixing these two approaches gives me slowdown of ~20%. When I swich everywhere to optixTraverse(); optixInvoke() the slowdown is ~1%.
This is anyway important to know, since I’d like to cast 2-3 radiance rays from the primary hit position, re-using manually created hit object. Now I am doing this including also the traversal of the primary ray each time with the optixTrace(). So I know that the change requires changing also that part with occlusion rays.

I started yesterday with testing SER, but got significantly worse performance and started investigating… that was the aim of running optixTraverse(); optixInvoke() without SER.

droettger · April 12, 2024, 1:28pm

Mixing these two approaches gives me slowdown of ~20%.

Now that is actually weird. That is not what I have seen above.

I only changed the radiance ray shot inside the ray generation program from optixTrace to optixTraverse/Invoke:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/MDL_renderer/shaders/raygeneration.cu#L176
I did not change the shadow ray shot inside the hit programs:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/MDL_renderer/shaders/hit.cu#L454

This is anyway important to know, since I’d like to cast 2-3 radiance rays from the primary hit position

From which program domain?
I’m not doing that. Radiance rays are only shot from the ray generation program. There is no recursion in my path tracers at all.
The ray generation program is also where the optixReorder belongs.

re-using manually created hit object.

Why though? There is an implicit hit object after optixTraverse. You’re saving that into a manually generated hit object to be able to reuse it for the following optixTraverse calls? (Code would be helpful.)

(Sidenote issue about hit objects and transformations:
https://forums.developer.nvidia.com/t/understanding-optixtransformnormalfromobjecttoworldspace/285169 )

I started yesterday with testing SER, but got significantly worse performance and started investigating

It can still be that SER is a loss when you’re using too much local memory.
Please read the very first link I posted above with performance experiments I had done.

Topic		Replies	Views
Profiling optix OptiX	10	1498	October 25, 2022
Is it possible to call optixTrace from custom intersection? OptiX	15	1382	June 14, 2022
Bad optix ray-shooting performance. OptiX	8	1568	June 14, 2022
How CUDA works within Optix OptiX cuda	20	732	March 25, 2025
Comparing Optix performance to CUDA OptiX	20	6413	June 14, 2022
Exploring the sample optixTriangle program OptiX	4	1820	April 12, 2023
How do I avoiding hitting the same triangle when calling tracing another ray? OptiX	14	816	February 7, 2024
Help reduce the high register count of an Optix raytracer code OptiX	12	1570	August 18, 2022
How is the performance of the huge amount of rays in this small scene? OptiX cuda , optix	5	93	January 10, 2026
Optix based collider performance OptiX	14	2069	June 14, 2022

OptiX 8: optixTrace() vs optixTraverse()+optixInvoke() performance

Related topics