Also maybe look into these recent related threads with more links to example code:
https://forums.developer.nvidia.com/t/make-ray-using-optix-7-0-0/158577
https://forums.developer.nvidia.com/t/global-payload/159415
https://forums.developer.nvidia.com/t/mesh-artifacts-when-using-anyhit-for-transparency-optix-7/156600
I would reorder the fields in your struct RayDataRadiance
by CUDA data type alignment restrictions.
The way you have it now requires three 32-bit words padding by the compiler behind the int depth
to make the uint4 random
16-byte aligned.
To iterate on the explanation from David how to combine both per ray data structures into one and only use only two 32-bit payload registers on optixTrace(), that could look like this:
struct PerRayData
{
// 16 byte aligned
float4 result;
uint4 random;
// 8 byte aligned data like float2 would go here.
// 4 byte aligned
int depth;
float attenuation; // Optionally put this here and use the same per ray data for both ray types.
};
Again it’s faster to keep the single float attenuation
for the shadow ray separate and have that as local variable around the optixTrace() call for the shadow ray which then has that float encoded into in a single payload register for that ray type only, if you only need that result temporarily to attenuate the lighting result directly after the trace for the shadow ray returns.