I am building some PTX shaders to be employed by Optix 7 and am trying to build an __anyhit__ function that sets an attenuation value to zero and then terminates the Ray. The code I have follows: system_param.h:
The code works but I donât trust the setting of attenuation to zero in the â__anyhit_shadow()â function. Can anyone tell me the more standard way of executing this type of operation using Optix 7?
The standard way of initializing your ray payload is to do it before calling optixTrace(). For this case, where youâre asking about shadow rays, you can put the initialization wherever the shadow ray is cast, it might be in a closest-hit shader, or in raygen. For primary rays, payload initialization would always go in the raygen program immediately prior to calling optixTrace().
Any-hit will be called before closest-hit, so your code might work for you in that sense. But, donât forget that any-hit shaders are not expected to be called in depth order, so if you are initializing something in your payload for use in any-hit shaders, it wonât work. Also calling optixTerminateRay() can work for opaque shadows, but if you need to accumulate your attenuation with transparent objects, you will need something more sophisticated in order to sort your any-hit intersections in depth order and composite them properly.
Bigger picture here is that if you are casting shadow rays against RTX enabled triangle meshes, you should consider disabling your any-hit shader completely (OPTIX_GEOMETRY_FLAG_DISABLE_ANYHIT or OPTIX_INSTANCE_FLAG_DISABLE_ANYHIT or OPTIX_RAY_FLAG_DISABLE_ANYHIT), and instead use closest-hit shaders and the the flag OPTIX_RAY_FLAG_TERMINATE_ON_FIRST_HIT.
It used to be the case in OptiX 5 and earlier, before there was hardware acceleration, that using any-hit programs for shadows and terminating the ray was the most performant way to do shadowing. With the hardware acceleration, now the recommended approach is to avoid using any-hit programs completely, if you can.
To be clear, the single declaration of extern "C" __device__ SystemParam param; with the PerRayDataShade struct inside is not going to function at all because that completely ignores the fact that there are thousands of CUDA threads running simultaneously which would all hammer onto that same memory which breaks your intended per-ray attenuation value.
If you need anything to happen per ray, it needs to be controlled via the payload registers inside the optixTrace calls.
That also means there isnât actually a need to have a payload structure with a single float value in your case, but you should just use one of the maximum 8 payload registers which would be faster than encoding a 64-bit pointer to some local payload structure memory.
You would write to that register with the optixSetPayload_0 device function (or whichever register index you use) before calling optixTerminateRay() inside your anyhit program.
But as David explained, if you do not have any geometry which needs to call optixIgnoreIntersection() in an anyhit program for cutout opacity etc., then the shadow/visibility ray doesnât need an anyhit program at all, because you can handle that quicker in OptiX 7 by using the aforementioned instance and ray flags.
Check the traceOcclusion() functions used in various OptiX SDK 7.2.0 examples.
Thatâs literally code from the OptiX SDK examples.
I would recommend to do a âFind in Filesâ over all *.h; *.cpp; *.cu source files with your favorite text editor inside the OptiX SDK, and then read the documentation about the OptiX device functions used inside the CUDA code.
If you need more data you would need to declare a local payload structure inside your ray generation program and encode the 64-bit pointer to it into two of the payload registers.
Here is my ray generation program using split and merge pointers
(The SDK example use a different implementation, search for packPointer and unpackPointer.)
Even then it still makes sense to keep the most frequently accessed fields in that inside payload registers for performance reasons. Avoid as many memory accesses as you can.
See this thread for related comments.
Thanks again for all the assistance @droettger. I really like your split and merge operations.
I assume the following snippet would be okay with regards to properly using these kernels from inside an OptiX program (e.g. __raygen_rg()):
float4 totalResult = make_float4(0.0f, 0.0f, 0.0f, 0.0f);
// do some stuff
PerRayData prd; // defined per-ray data struct
uint2 payload = splitPointer(&prd);
optixTrace(..., payload.x, payload.y);
totalResult += prd.result; // where result is field in prd
Similar in my raygeneration.cu files if you look for radiance and prd.radiance.
I use my vector_math.h header which contains some convenient overloads for the CUDA vector types which allow to write this shorter: float4 totalResult = make_float4(0.0f);
The original vec_math.h inside the OptiX SDK does that as well, but I think the lerp(), bilerp(), and roundup() functions do not belong into the initial #ifdef(__CUDACC__) section, but only the min() and max() functions. Thatâs why I have my own header which added some more overloads as well.