Your not setting –use_fast_math?
That will result in much slower code for trigonometric functions and reciprocals.
It’s recommended to always set --use_fast_math in OptiX device code.
Look for approx in your PTX code before and after.
Correct. If you’re working on Kepler just don’t use OptiX 5.0.0 anymore but at least update to an OptiX 5.1.x version to benefit from all fixes which went into the releases.
Depends on what you’re doing. If all your programs are static, just compile them in your project with NVCC.
If you’re compiling shaders at runtime that’s easier and faster to do with NVRTC.
I’m doing both. The core programs are built by the project, the bindless callable programs for my material hierarchy are built with NVRTC at runtime, depending on the scene contents.
Though that also requires the OptiX, CUDA and my renderer’s headers on the target machine.
My renderer architecture looks basically like the block diagram at the end of this page:
https://github.com/nvpro-samples/optix_advanced_samples/tree/master/src/optixIntroduction
When having runtime generated bindless callable programs for each unique material shader, the closest hit program does more calls than in those examples, but the fixed function code for BSDF sampling and evaluation and light sampling is just the same structure, I just have many more of them in my MDL capable renderer.
This is the most flexible and smallest OptiX kernel for that amount of features I could come up with.
This approach allows all kinds of light transport algorithms, because I can sample and evaluate BSDFs and lights not only inside the closest hit program domain.
EDIT: Related discussion:
https://devtalk.nvidia.com/default/topic/1030935/what-is-the-best-practice-of-implementing-shader-graph-in-optix-/
The previous mindset of renderer architectures was to avoid tracing rays at all costs because that was the most expensive operation. Now that ray traversal and triangle intersection is hardware accelerated, that is no longer true. The recommendation is always to make the shading as efficient as possible, which is more important now than in the past.
The OptiX SDK CMakeLists.txt containss this option to switch compilers:
# Select whether to use NVRTC or NVCC to generate PTX
set(CUDA_NVRTC_ENABLED ON CACHE BOOL "Use NVRTC to compile PTX at run-time instead of NVCC at build-time")
I have not used that option: -std=c++14. I would have written your function as
RT_FUNCTION float spd_value_at(int spd, float lambda)
{
// ...
const float f0 = rtBufferId<float, 1>(spd)[i0]; // When possible spread these two memory reads out.
const float f1 = rtBufferId<float, 1>(spd)[i1];
return optix::lerp(f0, f1, t); // This uses one multiplication less than your code.
}
This is all guesswork. I would need to try myself what happens when using buffer IDs as parameters.
Instead of buffers have you considered using 1D textures (via bindless texture IDs) with linear filtering and wrap mode clamp_to_edge for that data? The linear filtering is for free then.