Large CH function

I have a large Closest Hit function(about 400-500 lines of code). It has a lot of condition checks and calculations. I wish to split it up to make it easier to edit the code. I know that I can use other device functions in CH but most of the calculations require the Payload data(which is packed into a couple of integers). What is the best method of keeping the CH small(and simple)?

This is in OptiX 7, right?

There’s no single best method, and it depends on exactly what your goals are. Keeping things small & simple is sometimes a bit at odds with keeping things fast, so it’s a balancing act.

You can use any of the normal C++ abstraction tools at your disposal to factor your CH function; functions, classes, templates, defines, etc… It’s common to break the function into several functions and rely on inlining to make sure you’re not losing any performance. The CUDA compiler normally inlines functions defined in the same file. You can use the forceinline tag if you want to make sure. This means you can make a helper function next to your CH function that takes a RadiancePRD& parameter, for example, and the compiler will expand your function call so that accessing your payload is no different between your CH function and the helper function you call.

Take a look at the optixWhitted_exp sample, for example. The getRadiancePRD() function in shading.cu returns a struct. Because it’s inlined, this is less expensive than it looks.

static __device__ __inline__ RadiancePRD getRadiancePRD()
{
    RadiancePRD prd;
    prd.result.x = int_as_float( optixGetPayload_0() );
    prd.result.y = int_as_float( optixGetPayload_1() );
    prd.result.z = int_as_float( optixGetPayload_2() );
    prd.importance = int_as_float( optixGetPayload_3() );
    prd.depth = optixGetPayload_4();
    return prd;
}

The optixWhitted_exp sample has other examples of helper functions as well, see the accompanying “helpers.h” file and take a look at how the functions are used.

Please be aware that this Whitted example isn’t really following the absolute best practices for performance, but this is a good example of how to build helper functions to keep your CH program manageable. If your payload is less than 64 bytes (which it is in optixWhitted_exp), the advice for best performance is to use the payload attributes directly (via the optix{Get,Set}Payload_*() functions.) That means don’t copy the payload values into a struct or variables, instead just use the payload functions directly. Using this advice might change how you choose to factor your CH function. You could imagine writing some helper functions or a class to abstract which payload value is in which payload slot, so that you don’t have to use the slot number directly in your code every time you access the payload.

The optixPathTracer_exp sample uses packed pointers for the payload, which is best for large payloads that can’t fit into the 8 payload registers.

Try to make sure you any conditional checks in your CH function that are constant across your launch are not requesting memory, if you can. Using #defines or constant template parameters is better than a run-time check since the unused code can be elided. If you do need a run-time check, prefer using your constant params and your payload to shared or global memory.

I’m mixing code management advice with performance advice, so I hope it helps but I’m not sure I’m answering your question. If this doesn’t give you some ideas, then feel free to get more specific on your situation and what your constraints look like.


David.