Hello!
I’m new to profiling Optix programs and wanted to know: What is the best way to compute the number of rays processed per second? Would this require adding some code or using a profiler? I’d appreciate any insight!
Hello!
I’m new to profiling Optix programs and wanted to know: What is the best way to compute the number of rays processed per second? Would this require adding some code or using a profiler? I’d appreciate any insight!
Hi @n16,
This is a good question, it’s not always straightforward or trivially easy to gather rays per second data without affecting the perf you want to measure.
Nsight tools don’t report rays per second directly. (Maybe they could! I’ll look at putting in a feature request.) You still might be able to use Nsight Systems for timing, since it isn’t very invasive, and it does report kernel timings. Nsight Compute might be able to help, but there’s no direct metric and it’s a bit more invasive and might affect perf. I do recommend getting familiar with the Nsight profiling tools and trying to use them, but in the case of computing rays per second, it’s also useful to know how to gather the data yourself.
In the past my strategy has been to run 2 different OptiX kernels back to back, each with the same render job, one specialized for high performance, and the other specialized for counting the total number of rays. I typically use the OptiX feature called ‘bound values’ for this, since it is very convenient. Bound values are launch params that look like variables but are compiled out and optimized at module creation time. So my time-specialized kernel will not count rays, and will compile out any and all debug features, and generally be optimized to go as fast as possible. My ray-count-specialized kernel will use an atomic to count the total number of calls to either optixTrace()
or optixTraverse()
. You only need to use an atomic and count your rays if you’re doing something complicated like stochastic path tracing. If you were to do something simple, like only cast primary rays, say a constant number per pixel, then maybe your ray count can be determined in advance at launch time and maybe you don’t need to measure the ray count. At the end, I just divide my ray count from the ray-count kernel by the time spent in the time kernel, and that is my computed rays per second. Make sense?
Because my ray counts are stochastic in the above example, and because I measure time and ray counts with 2 different runs, that means my computed rays per second metric is not 100% accurate. But with enough rays it is statistically accurate, usually much more accurate than the measurement noise. Typically I’m testing billions of rays and the ray counts are consistent from run to run to within a very small fraction of a percent (hundredths or thousandths), and so the computed rays per second is quite reliable.
A few additional notes:
I hope that helps, let me know if anything in there doesn’t make sense, or if that brings up more questions.
–
David.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.