How CUDA Warp(s) relate to OptiX 7 Ray(s)

Good Afternoon,

This is a beginner question, but is important for me to be able to more fully understand the OptiX 7 Ray tracing system.

Can anyone give me a brief description as to how a given Ray in OptiX relates to the underlying CUDA warp? Can one access lower-level intrinsic operations (e.g. warp shuffle, ballot, atomicAdd, etc.) when using OptiX 7 - like maybe from a closest-hit program?

Thank you for any information.

Hi good question. There is a slightly complicated answer. This is covered in the Programming Guide in various places, I’ve included some links here. One good way to find these quickly is to search the Programming Guide for the word “warp”.

The relationship of rays to threads is defined by you explicitly. Rays only exist when you call optixTrace(). You can call optixTrace() any number of times for a thread, 0, 1, or multiple times in a single thread. Each time you call optixTrace(), the spawned ray belongs to the calling thread, as do any programs invoked by that ray (any-hit, closest-hit, miss, etc.)

The remainder of the question then is how OptiX threads relate to CUDA warps. There is a limited set of CUDA warp intrinsics that are allowed & supported in OptiX. Just remember if you use them in a hit shader, it’s very common for some threads in a warp to be inactive.

The NVIDIA OptiX 7 programming model supports the multiple instruction, multiple data (MIMD) subset of CUDA. Execution must be independent of other threads. For this reason, shared memory usage and warp-wide or block-wide synchronization—such as barriers—are not allowed in the input PTX code. All other GPU instructions are allowed, including math, texture, atomic operations, control flow, and loading data to memory. Special warp-wide instructions like vote and ballot are allowed, but can yield unexpected results as the locality of threads is not guaranteed and neighboring threads can change during execution, unlike in the full CUDA programming model. Still, warp-wide instructions can be used safely when the algorithm in question is independent of locality by, for example, implementing warp-aggregated atomic adds."

Here are a few more relevant sections in the Programming Guide that elaborate:


Program & Data Model:

Ray Generation Launches:

Launch Index


Thank you for the information @dhart

Very useful.