Hey good questions.
optixLaunch(), every invocation of your raygen program will be a separate thread. When you specify the
height of your 2D OptiX launch, you can expect the number of threads to be
width * height. Your call to
optixGetLaunchIndex() gives you an index that identifies the thread.
Note the last sentence in that section: “program execution of neighboring launch indices is not necessarily done within the same warp or block, so the application must not rely on the locality of launch indices.”
So the answer to the first part of your first question is “yes”, and to the second part of the first question, “no”. You should not assume that the first 32 rays are grouped together into the first warp. It’s hard to define what “first” means, and threads and warps in general do not execute in sequential order. OptiX automatically structures a 2D launch into tiles for efficiency, so your sequential thread ids will usually not be in scan-line order. On top of that, OptiX reserves the right to move threads during execution: “For efficiency and coherence, the NVIDIA OptiX 7 runtime—unlike CUDA kernels—allows the execution of one task, such as a single ray, to be moved at any point in time to a different lane, warp or streaming multiprocessor (SM). (See section “Kernel Focus” in the CUDA Toolkit Documentation.) Consequently, applications cannot use shared memory, synchronization, barriers, or other SM-thread-specific programming constructs in their programs supplied to OptiX.” https://raytracing-docs.nvidia.com/optix7/guide/index.html#introduction#overview
Some additional reading on raygen and threads here: https://raytracing-docs.nvidia.com/optix7/guide/index.html#ray_generation_launches#ray-generation-launches
Launch parameters are in device memory. Currently launch params are put into constant (read-only) memory for efficiency, and the launch params buffer is limited to a maximum size of 64KB.
Payload values are generally compiled into registers. If you need more space for a payload than the limited number of payload slots, you can put a pointer to memory in the payload. That usually comes with the associated indirection and memory access costs.
The Programming Guide does mention both of these in different sections, but it’s easier for me to find them quickly when I know what I’m looking for. It’s true that OptiX is abstracting this away a bit, but for good reason - in general OptiX is putting these things in the most efficient-to-access place possible.
Launch Params in constant mem: https://raytracing-docs.nvidia.com/optix7/guide/index.html#program_pipeline_creation#7054
Payloads in registers: https://raytracing-docs.nvidia.com/optix7/guide/index.html#device_side_functions#trace