Code organization and CMake

droettger · April 23, 2024, 6:36am

Since I wrote these for a reason, I think that these are both better approaches when developing a standalone OptiX application than the outdated FindCUDA.cmake method used inside the OptiX SDK example framework.

The method I am using inside my examples with an easy to configure NVCC command line inside a single macro which generates custom build rules is fine when you’re not using native CUDA runtime launches inside the application as well. (I’m only using one native CUDA kernel in my examples so far, which is launched with the CUDA Driver API, which is more cumbersome to call compared to CUDA Runtime API object code with the chevron <<<>>> methods.)

The CMake native LANGUAGE CUDA support strictly requires the use of CMake’s Object Library feature to separate the OptiX device code translation from the native CUDA kernel object code compilation.

The OptiX SDK framework is somewhere in between and a lot harder to change in my opinion.

Please also read all posts which link to that thread as well (the grey links at the bottom there). There is at least one CMake bug which prevents mixing of projects using the CMake native LANGUAGE CUDA feature with other OptiX projects only using the custom build rules inside the same solution. I also ran into that in the past.

my functions.h/cu is compiled with CUDA and then linked to the Optix kernels

That is absolutely not what is happening there!

The native CUDA kernels are compiled to object code and launched with the CUDA Runtime API chevron operator.

The OptiX device code translated to PTX or OptiX-IR intermediate code must be separate from that. All you need for a wavefront renderer implementation shown inside the optixRaycasting example is reading rays and writing hit/miss results. The OptiX device code *.cu files implementing that should be completely separate from your native CUDA kernels doing the ray generation and shading, except maybe for header files defining structures used in both.

Inside the native CUDA kernels you can use everything availble inside the CUDA APIs.
Inside OptiX device code you’re limited to what constructs OptiX allows.

OptiX prefers to inline everything which is not an entry point function!

When implementing helper functions used inside the OptiX device code, I’m always using __forceinline__ __device__ and place them into headers when they are used in more than one OptiX device *.cu file. Otherwise they can also be defined inside the only *.cu file where they are used.

The point is that OptiX is not linking and calling functions by default, but inlines them, which results in faster code because then the compilers and assemblers can do more optimizations.

There is a way to let OptiX actually use non-inlined calls which is using the __optix_enabled__ prefix for function names and define them as __noinline__ but that has limited use and is slower than inlined code.
The function then can be declared inside a header and must be defined in one *.cu module used inside the OptiX pipeline for it to be linked.
Described here: https://raytracing-docs.nvidia.com/optix8/guide/index.html#callables#non-inlined-functions

It’s helpful for huge OptiX device code sometimes. You should definitely not need to use that inside the OptiX device code of a wavefront renderer. That is much too simple.