Hey good questions.
So regarding optixLaunch()
, every invocation of your raygen program will be a separate thread. When you specify the width
and height
of your 2D OptiX launch, you can expect the number of threads to be width * height
. Your call to optixGetLaunchIndex()
gives you an index that identifies the thread.
https://raytracing-docs.nvidia.com/optix7/guide/index.html#device_side_functions#launch-index
Note the last sentence in that section: âprogram execution of neighboring launch indices is not necessarily done within the same warp or block, so the application must not rely on the locality of launch indices.â
So the answer to the first part of your first question is âyesâ, and to the second part of the first question, ânoâ. You should not assume that the first 32 rays are grouped together into the first warp. Itâs hard to define what âfirstâ means, and threads and warps in general do not execute in sequential order. OptiX automatically structures a 2D launch into tiles for efficiency, so your sequential thread ids will usually not be in scan-line order. On top of that, OptiX reserves the right to move threads during execution: âFor efficiency and coherence, the NVIDIA OptiX 7 runtimeâunlike CUDA kernelsâallows the execution of one task, such as a single ray, to be moved at any point in time to a different lane, warp or streaming multiprocessor (SM). (See section âKernel Focusâ in the CUDA Toolkit Documentation.) Consequently, applications cannot use shared memory, synchronization, barriers, or other SM-thread-specific programming constructs in their programs supplied to OptiX.â https://raytracing-docs.nvidia.com/optix7/guide/index.html#introduction#overview
Some additional reading on raygen and threads here: https://raytracing-docs.nvidia.com/optix7/guide/index.html#ray_generation_launches#ray-generation-launches
Launch parameters are in device memory. Currently launch params are put into constant (read-only) memory for efficiency, and the launch params buffer is limited to a maximum size of 64KB.
Payload values are generally compiled into registers. If you need more space for a payload than the limited number of payload slots, you can put a pointer to memory in the payload. That usually comes with the associated indirection and memory access costs.
The Programming Guide does mention both of these in different sections, but itâs easier for me to find them quickly when I know what Iâm looking for. Itâs true that OptiX is abstracting this away a bit, but for good reason - in general OptiX is putting these things in the most efficient-to-access place possible.
Launch Params in constant mem: https://raytracing-docs.nvidia.com/optix7/guide/index.html#program_pipeline_creation#7054
Payloads in registers: https://raytracing-docs.nvidia.com/optix7/guide/index.html#device_side_functions#trace
â
David.