Use C++ Class that is not header-only on both Host and Device

carlo.rosati · September 30, 2025, 8:18pm

I am looking to use an existing C++ Class (for quaternions) on the device side that is used throughout the host code. Unfortunately the class is not header-only. I am wondering if it is possible to link the OptiX code to this class somehow.

I am hoping that there’s a way to decorate all of the needed functions with __host__ __device__ and then compile and link this into the OptiX pipeline with my existing raygen entry.

I have come across the thread Calling a device function from a RT_PROGRAM , however this suggests forceinlining everything, or using direct callables. I am not sure if direct callables would help me in this situation.

Thanks for any help

dhart · September 30, 2025, 8:47pm

Hi @carlo.rosati,

This thread explains some other options, e.g., using noinline or optix-enabled functions. Pay attention to the caveats, and let me know if that doesn’t answer your questions.

Oh, and do be aware that preventing inlining and using bona-fide stack based function calls for math libraries such as quaternions could potentially have a much higher impact on performance than you might expect. Function calls on the GPU can be pretty expensive due to the number of registers they consume, which can affect occupancy and cause spilling to memory, which is why inlining is used so heavily and so often by default.

–
David.

carlo.rosati · October 1, 2025, 4:29pm

Thanks David. As I have learned from reading that thread, header-only helpers are the recommended way for performance reasons, as we will want to inline as much as possible. I will attempt to go this route.

However, I am curious about using __noinline__ or __optix_enabled__ . Are there examples using this? I am guessing that I would decorate the desired functions and all of their callees in the Class header with __host__ __device__ __noinline__ (or do I need to put __optix_enabled__ too?). Then I would nvcc compile the Class .cpp file into .ptx. Then I would optixModuleCreateFromPTX, optixProgramGroupCreate as a OPTIX_PROGRAM_GROUP_KIND_CALLABLES, and finally supply it to optixPipelineCreate as an additional program group. Does that sound right?

dhart · October 1, 2025, 5:16pm

Good question. Unfortunately I just looked and we don’t have any examples in the SDK of uses of noinline and optix-enabled, and I don’t see any examples in the OptiX Toolkit either.

Yes you would decorate with __noinline__ as well as __host__ __device__. You shouldn’t use __noinline__ and the __optix_enabled__ prefix at the same time, as they do different things. (And note that __optix_enabled__ is a function name prefix rather than a qualifier.) The main difference is that __noinline__ won’t generally work for functions that contain OptiX device calls, while __optix_enabled__ does allow use of OptiX intrinsics. So for an external math library, for example, __noinline__ would be the more appropriate choice.

https://raytracing-docs.nvidia.com/optix9/guide/index.html#callables#non-inlined-functions

It doesn’t matter what program group kind you use, but you do need at least one program group in a module in order to be able to compile it. It can be a dummy program group.

And to be fair, there are some cases where preventing inlining is good for performance, and good for compile times. In OptiX, this tends to be when using heavy shaders that trace additional rays. For small functions in math libraries, when the function isn’t very long and consists mainly of math, and for functions that are called multiple times per thread, that’s when preventing inlining can be much slower than inlining. Don’t let me scare you away from trying it; the profiler’s output is more accurate and more tailored to your code than my speculation. ;)

–
David.

system · October 15, 2025, 5:16pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Calling a device function from a RT_PROGRAM OptiX	2	882	June 14, 2022
C++ class member __device__ functions, where to qualify as __forceinline__ ? CUDA Programming and Performance	2	2287	October 10, 2014
Classes with Cuda CUDA Programming and Performance	11	12333	November 11, 2010
Code organization and CMake OptiX	1	468	April 23, 2024
Do functions of user-defined class have to be inline if instances of the class are created in kernel CUDA Programming and Performance	3	1313	January 30, 2016
external calls to __device__ functions CUDA Programming and Performance	4	5025	July 20, 2010
Error: External calls are not supported how put __device__ functions in another module? CUDA Programming and Performance	4	9888	April 12, 2011
__device__ functions CUDA Programming and Performance	9	3220	November 10, 2010
What about calling non __device__ function inside kernel? Feature suggestion CUDA Programming and Performance	1	7892	June 3, 2011
Inline Support CUDA Programming and Performance	2	4658	February 21, 2007

Use C++ Class that is not header-only on both Host and Device

Related topics