I am looking to use an existing C++ Class (for quaternions) on the device side that is used throughout the host code. Unfortunately the class is not header-only. I am wondering if it is possible to link the OptiX code to this class somehow.
I am hoping that there’s a way to decorate all of the needed functions with __host__ __device__ and then compile and link this into the OptiX pipeline with my existing raygen entry.
I have come across the thread Calling a device function from a RT_PROGRAM , however this suggests forceinlining everything, or using direct callables. I am not sure if direct callables would help me in this situation.
This thread explains some other options, e.g., using noinline or optix-enabled functions. Pay attention to the caveats, and let me know if that doesn’t answer your questions.
Oh, and do be aware that preventing inlining and using bona-fide stack based function calls for math libraries such as quaternions could potentially have a much higher impact on performance than you might expect. Function calls on the GPU can be pretty expensive due to the number of registers they consume, which can affect occupancy and cause spilling to memory, which is why inlining is used so heavily and so often by default.
Thanks David. As I have learned from reading that thread, header-only helpers are the recommended way for performance reasons, as we will want to inline as much as possible. I will attempt to go this route.
However, I am curious about using __noinline__ or __optix_enabled__ . Are there examples using this? I am guessing that I would decorate the desired functions and all of their callees in the Class header with __host__ __device__ __noinline__ (or do I need to put __optix_enabled__ too?). Then I would nvcc compile the Class .cpp file into .ptx. Then I would optixModuleCreateFromPTX, optixProgramGroupCreate as a OPTIX_PROGRAM_GROUP_KIND_CALLABLES, and finally supply it to optixPipelineCreate as an additional program group. Does that sound right?
Good question. Unfortunately I just looked and we don’t have any examples in the SDK of uses of noinline and optix-enabled, and I don’t see any examples in the OptiX Toolkit either.
Yes you would decorate with __noinline__ as well as __host__ __device__. You shouldn’t use __noinline__ and the __optix_enabled__ prefix at the same time, as they do different things. (And note that __optix_enabled__ is a function name prefix rather than a qualifier.) The main difference is that __noinline__ won’t generally work for functions that contain OptiX device calls, while __optix_enabled__ does allow use of OptiX intrinsics. So for an external math library, for example, __noinline__ would be the more appropriate choice.
It doesn’t matter what program group kind you use, but you do need at least one program group in a module in order to be able to compile it. It can be a dummy program group.
And to be fair, there are some cases where preventing inlining is good for performance, and good for compile times. In OptiX, this tends to be when using heavy shaders that trace additional rays. For small functions in math libraries, when the function isn’t very long and consists mainly of math, and for functions that are called multiple times per thread, that’s when preventing inlining can be much slower than inlining. Don’t let me scare you away from trying it; the profiler’s output is more accurate and more tailored to your code than my speculation. ;)