The launch parameters are a developer-defined struct inside constant memory you can access in all OptiX device code which knows about that extern declaration.
Means OptiX does not “know” what is in these launch parameters. You can add whatever data you want to that, with the only limit being the available constant memory size, which is rather small with 64 kB.
In OptiX all output goes into “buffers” which in OptiX 7 is effectively a CUdeviceptr, so just a 64-bit pointer to linear device memory the GPU can access directly.
You define what data is inside these pointers. You just need to make sure your data placement adheres to the CUDA memory alignment rules. (E.g. 64-bit pointers need to lie on an 8-byte aligned address.)
You allocate this device memory with the resp. CUDA malloc calls which are either
cuMemAlloc() depending on the CUDA runtime or driver API you use on the host side of your application.
That means to be able to output color and depth values from your OptiX device code, you need two pointers to device memory which should hold the color resp. depth values for your image.
So to add more output buffers you can simply add additional CUdeviceptr fields into your launch params.
If you do not need to support different output formats (which could be handled with casts inside the device code, what I do in some of my examples for switching between float4 and half4 color output buffers) then you can also declare them to the correct data pointer types, like for example:
It’s just a matter how you feed in the CUdeviceptr into these launch param fields on the host.
For an example launch parameter struct which is outputting multiple buffers (and inputs some other buffers for the camera and light definitions) have a look at this example code.
Now to fill in the depth data into that buffer you would normally add a float field to the per-ray payload which you would need to initialize for the miss case to the farthest possible or desired value inside the raygeneration program before shooting the primary ray. Then inside the closest hit program you would need to write the intersection distance into that depth field and after returning back to the raygeneration program , you would need to write that distance as depth value to your buffer for the launch index assigned to that ray. Only do that for the primary ray when you want the depth from the camera.
E.g. similar to how this denoiser example is writing the normal buffer only for the primary ray and when there was a hit:
Mind that for a positional camera (pinhole, etc.) this is a radial distance from the camera position.
This is not the same as the depth buffer in a rasterizer which is a parallel distance from the front camera plane.
Means compositing depth values between a rasterizer like OpenGL and a raytracer camera projection requires a transformation of the radial distance into the rasterizer depth coordinate space.
Reading the buffer data from the device to the host can be done with a
cudaMemcpy(..., cudaMemcpyDeviceToHost) or
cuMemcpyDtoH(), again depending on which CUDA API you use (runtime or driver).
You can find examples doing that inside the OptiX SDK example source code. It’s most often used to readback the size of a compacted acceleration structure, but there are also cases where other buffers are read-back. Look for the function “download”.