Do you know if there has been any success using CUDA to generate ray tracing?
For example, can I employ CUDA to generate a series of frames - not for real time rendering, just a buffer(s)?
Could one generate these buffer(s) to get a measurement of FPS?
I’m trying to interpret that.
First , there have been many ray tracing implementations using CUDA natively in the past.
OptiX itself is using CUDA internally and with OptiX 7 all the host interaction is also native CUDA code now which simplifies interoperability between CUDA and OptiX 7 a lot.
Now, if you mean using CUDA to generate the rays which are then used in OptiX, yes, of course.
You can implement your ray generation program as you like. If you have a number of rays in a buffer, it’s very simple to use them as input.
There exist renderer architectures where only the ray intersection part has been replaced with OptiX and everything else (shading, ray generation) run in CUDA.
There is even an OptiX SDK example demonstrating that mechanism named optixRaycasting in which native CUDA kernels generate rays and OptiX shoots them and returns hit/miss results, which are then shaded in a native CUDA kernel again.
You could also do that with multiple buffers you prepared up front.
But the issue with that is the amount of rays the RTX hardware can handle (>10 GRays/sec) is way higher then what you need to read from and write to VRAM at the same time. Means you’re normally limited by memory bandwidth and need to make sure you keep memory accesses as low as possible to gain most from the hardware’s capabilities.
There existed a whole ray intersection API called OptiX Prime in the past which was discontinued in OptiX 7.0.0 because of that very reason, and since the OptiX 7 API is much more flexible and host code is using native CUDA buffers, it was not necessary anymore.
Still, it’s faster to generate rays on the fly with arithmetic than to write them into buffers read by OptiX again. The goal for optimal performance with OptiX and CUDA in general is to make use of registers as best as possible.
Please have a look through the OptiX SDK examples first.
There are examples like the optixMeshViewer which loads glTF models and handles their material as well. That is a Whitted style ray tracer which is really fast.
Then all my OptiX 7 examples implement a simple global illumination unidirectional path tracer with a very flexible architecture. All of these examples contain a benchmark functionality as well.
The most advanced one (rtigo3) can render in arbitrary resolutions independently of the window client area with a fixed number of samples. It can load triangle mesh data from different model file formats and assign materials to them.
This one is explicitly meant to compare the performance between different multi-GPU workload distribution and compositing strategies, as well as different OpenGL interoperability methods for the final display of the result.
It also works with single GPU of course and contains a pure ray-tracing-only benchmark mode without the display part where the final image is written to disk only.
I would look at these first to get an impression of the RTX ray tracing performance.
Before doing that, please read this post about how to go about measuring fps without being limited to the monitor refresh rate:
https://forums.developer.nvidia.com/t/optix-6-5-demo-performance-concern/128404/2