How CUDA Warp(s) relate to OptiX 7 Ray(s)

picard1969 · March 19, 2021, 6:07pm

Good Afternoon,

This is a beginner question, but is important for me to be able to more fully understand the OptiX 7 Ray tracing system.

Can anyone give me a brief description as to how a given Ray in OptiX relates to the underlying CUDA warp? Can one access lower-level intrinsic operations (e.g. warp shuffle, ballot, atomicAdd, etc.) when using OptiX 7 - like maybe from a closest-hit program?

Thank you for any information.

dhart · March 19, 2021, 9:40pm

Hi good question. There is a slightly complicated answer. This is covered in the Programming Guide in various places, I’ve included some links here. One good way to find these quickly is to search the Programming Guide for the word “warp”.

The relationship of rays to threads is defined by you explicitly. Rays only exist when you call optixTrace(). You can call optixTrace() any number of times for a thread, 0, 1, or multiple times in a single thread. Each time you call optixTrace(), the spawned ray belongs to the calling thread, as do any programs invoked by that ray (any-hit, closest-hit, miss, etc.)

The remainder of the question then is how OptiX threads relate to CUDA warps. There is a limited set of CUDA warp intrinsics that are allowed & supported in OptiX. Just remember if you use them in a hit shader, it’s very common for some threads in a warp to be inactive.

The NVIDIA OptiX 7 programming model supports the multiple instruction, multiple data (MIMD) subset of CUDA. Execution must be independent of other threads. For this reason, shared memory usage and warp-wide or block-wide synchronization—such as barriers—are not allowed in the input PTX code. All other GPU instructions are allowed, including math, texture, atomic operations, control flow, and loading data to memory. Special warp-wide instructions like vote and ballot are allowed, but can yield unexpected results as the locality of threads is not guaranteed and neighboring threads can change during execution, unlike in the full CUDA programming model. Still, warp-wide instructions can be used safely when the algorithm in question is independent of locality by, for example, implementing warp-aggregated atomic adds." https://raytracing-docs.nvidia.com/optix7/guide/index.html#program_pipeline_creation#7014

Here are a few more relevant sections in the Programming Guide that elaborate:

Introduction https://raytracing-docs.nvidia.com/optix7/guide/index.html#introduction#2006

Program & Data Model: https://raytracing-docs.nvidia.com/optix7/guide/index.html#basic_concepts_and_definitions#program

Ray Generation Launches: https://raytracing-docs.nvidia.com/optix7/guide/index.html#ray_generation_launches#ray-generation-launches

Launch Index https://raytracing-docs.nvidia.com/optix7/guide/index.html#device_side_functions#12113

David.

picard1969 · March 22, 2021, 11:45am

Thank you for the information @dhart

Very useful.

Topic		Replies	Views
Take full advantage of CUDA core and RT core OptiX	1	2272	February 6, 2023
[OptiX 7] About rays switching lanes/threads OptiX	2	931	June 14, 2022
CUDA/RTX CUDA Programming and Performance	4	101	September 8, 2024
Wraps in Ray gen and how data is initially stored in the memory hierarchy OptiX	13	1029	June 14, 2022
Task scheduling in OptiX 7 OptiX	6	1226	October 12, 2021
How Does OptiX Handle Cache Utilization, Branch Divergence, and Bank Conflicts Internally? OptiX	4	57	March 19, 2025
Questions about the implementation of OptiX GPU-Accelerated Libraries	0	458	June 29, 2018
Newbie OptiX question(s) OptiX	11	1254	June 14, 2022
Porting to OptiX 7 OptiX	3	2240	June 14, 2022
Using tensor cores in Optix OptiX	4	1397	June 15, 2022

How CUDA Warp(s) relate to OptiX 7 Ray(s)

Related topics