Branch divergence

Vectorizer · April 26, 2021, 5:44am

As the number of times the rays get bounced around is different for different points in the scene, there must be significant gpu branch divergence, hence poor performance. Is there a study on this; what can be done about it?

dhart · April 26, 2021, 5:07pm

There are different sources of divergence, so it depends on the scene & renderer whether divergence will be a problem. One source of divergence is ray traversal, intersection, and any-hit shaders. When rays in a warp are traversing different depths of an acceleration structure, divergence happens. If the rays are going through parts of the scene with very different geometric density, divergence appears.

Another source of divergence is shading materials in the scene. If each ray in a warp hits a surface that uses a unique material shader, the cost will be significantly higher than if all the rays hit the same material.

Yes there are studies on this topic, and yes there are things you can do to improve situations that have significant divergence. For traversal, using the RTX hardware traversal and intersection for all of the scene geometry is a major way to both improve performance and cut down divergence. Avoiding any-hit programs and custom intersection programs when possible is another way to improve performance and potentially improve divergence. (It’s not always possible to avoid any-hit programs or custom intersectors, and I don’t want to stigmatize the use of valid necessary features, but any-hit and intersection programs do interrupt hardware traversal to execute your code on the SMs, and so there’s significant overhead.)

For materials, some people try to use an “ubershader” - a single shader program that can handle many kinds of material properties. Others sometimes use a “wavefront” architecture that separates tracing work and shading work into separate passes, where the shading work can be sorted and scheduled in batches by material. A wavefront architecture also allows you consolidate ray tracing work at every step of path depth, so the kernels get smaller as you go, and warps stay compacted with active work on all threads.

–
David.

Topic		Replies	Views
Divergence and step by step path tracing OptiX	1	977	June 14, 2022
Optimize GPU Workloads for Graphics Applications with NVIDIA Nsight Graphics Technical Blog	1	71	December 5, 2024
How Does OptiX Handle Cache Utilization, Branch Divergence, and Bank Conflicts Internally? OptiX	4	201	March 19, 2025
Branch divergence and executing serial could be misinterpretted. CUDA Programming and Performance	8	4168	December 21, 2016
Request for the clarification of the "Single Ray Programming Model" OptiX	2	161	December 10, 2024
Warp branching CUDA Programming and Performance	11	10424	October 26, 2010
Diverge-free doesn't win 32x over Diverge-all warp divergence CUDA Programming and Performance	6	3232	September 14, 2007
Handling of Divergent Control Flow CUDA Programming and Performance	4	1110	April 1, 2023
why callable program only run in optix api? OptiX	6	967	June 14, 2022
divergent branches how to change it? CUDA Programming and Performance	1	2753	April 26, 2009

Branch divergence

Related topics