The best way I know of to estimate performance expectations, and measure performance is to gather rays per second metrics. Do you know how many rays per second you get on CPU?
You can estimate rays/sec roughly if you have frames per second, or better yet kernel timings, and a good idea of how many rays you cast (e.g., screen resolution multiplied by samples per pixel plus the number of secondary rays). Note that using frames per second is only a very rough approximation, and might not be very stable, and includes a lot of different overheads.
To get very stable numbers, this might require instrumenting your program with a little bit of extra code. The kernel launch time needs to be measured carefully, ideally using CUDA stream events placed before and after the OptiX launch. (Less ideal, but probably adequate is to start the timer, launch OptiX, and then synchronize before stopping the timer - meaning call cudaStreamSynchronize()
or cudaDeviceSynchronize()
.) You will also need a count of the number of rays that you cast. This can be trivial to calculate, if you are casting only primary rays, or it may require some code to count the rays if you are casting reflection, refraction or shadow rays.
Be aware that counting optixTrace()
calls can affect performance. When I do this, I normally make my simulation repeatable and I compile the shaders twice, once with ray counting enabled (via an atomic counter), and once with ray counting disable. Then I count the rays and time the performance with separate launches using the ray-count enabled and disabled shaders respectively. This takes a little bit of effort, of course, but can give much better & more stable benchmark measurements than other methods.
I also recommend locking your GPU clocks while you measure performance. Otherwise, you might get thermal throttling, which means the clock speed will slow down and make timings difficult to reproduce. I usually do this in a script that calls nvidia-smi. See the -lgc
and -rgc
options.
It is best to leverage the RTX hardware, so yes that does mean using the built-in triangle primitive, rather than using any custom software intersectors. With a Titan RTX GPU, if you are getting less than 100 million rays per second with simple geometry & simple shaders, then something is probably very wrong. If you are getting more than that but less than 1 billion rays per second, that might indicate plenty of room for optimization. If you are getting more than 3-5 billion rays per second then you might be achieving very high utilization already.
–
David.