I am a newbie with OptiX but trying to learn. Please forgive if this is a dumb question.
Is there a way to control the total number of rays generated by OptiX during rendering programmatically? I would like to run some low-level performance studies for rendering by increasing/decreasing number of rays generated for a single frame. For example, a function called from main() function that would pass the total number of rays to generate to the underlying OptiX kernel(s) - if that is possible.
Thanks to anyone for hints/help.
Hey there @rhaney, with OptiX you always have complete control over exactly how many rays you trace, and what the rays do when you trace them. With OptiX 7, for example, you always trace a ray explicitly using a call to the function optixTrace(). If you call optixTrace exactly once in your raygen program, then you will have exactly as many rays traced as your launch size. You can take advantage of the pipelineParams variable to optixLaunch() in order to pass parameters to your raygen or other OptiX device programs to allow you to optionally send a different number of rays. It’s common, for example, to pass a number of rays per pixel via the pipelineParams parameter to optixLaunch() in order to control super-sampling. You could start by poking around the optixPathTracer sample in the OptiX 7 SDK to see how to control the number of rays traced from the camera.
Thank you @dhart and @droettger . Good information for a newbie to OptiX.
Another newbie question, different subject, if that is okay.
I ran the Siggraph OptiX 7 example2_pipelineAndRayGen on a RT and non-RT card separately to get an idea of performance for different cards. I noticed that for smaller frame sizes the non-RT card outperformed the RT card but once the frame buffer got larger the RT card performed better. Is this similar to performance one sees with standard CUDA kernels whereby the larger the matrix the better the performance?
That would need some more details about the system configurations you’ve been comparing.
That example2_pipelineAndRayGen is just filling the output buffer without shooting a single ray.
That’s effectively a two-dimensional CUDA kernel, just more complicated.
It’s more of a VRAM bandwidth and number of CUDA cores benchmark than anything else.
These examples are meant to show OptiX 7 concepts in a simple way. They are not optimized for performance.
Remember what I said about the gdt/math/vec.h classes not being suited for vectorized device memory accesses.
Also never use empty device programs in an OptiX pipeline like these examples do. Use a nullptr for the module and program name instead. Not assigning a program is faster than an empty program.
If you want to compare ray tracing performance you should look at the final examples, or use the OptiX SDK 7.1.0 optixMeshViewer for a Whitted style renderer, or use my OptiX 7 applications for path tracers which have benchmarks built-in.
Also when benchmarking anything which is measured in frames per second with display to the screen, make sure to disable vertical sync inside the NVIDIA display control panel.
Find that here:https://forums.developer.nvidia.com/t/optix-6-5-demo-performance-concern/128404/2
Thank you @droettger for information.
Dumb question but are there any example(s) with OptiX 7 that render a single frame using ray tracing? Something that is fairly straightforward for a newbie?
Sure, just work through the OptIX SDK 7.0.0 or 7.1.0 examples and you’ll see.
Look at the optixTriangle code which does everything in the main() function and either renders the image to the screen or writes it to disk.
The only simpler example is optixHello which doesn’t even shoot a ray.
This blog post, which is also a link in the top-most sticky post, explains the optixTriangle example in detail:
The SIGGRAPH OptiX 7 course examples you can find in the same sticky posts also show that example progression step-by-step.
(My OptiX 5 based examples here did the same https://github.com/nvpro-samples/optix_advanced_samples/tree/master/src/optixIntroduction
and I’ve ported the later ones over to OptiX 7 but these are more advanced. Links also in the sticky posts.)
Thank you again @droettger. The links and information is very helpful.
I will set the output file to some value (e.g. ‘triangle.ppm’) so a single image is written to disk and put a timer around the optixLaunch call in the main() function to get a execution time of ray tracing.
Actually since the optixLaunch() calls are asynchronous, you need to include the following cudaDeviceSynchronize() call as well (inside the CUDA_SYNC_CHECK macro) or you get only the time to submit the launch to CUDA, not the actual ray tracing performance.
That example is not going to give you a lot of data with the single triangle. If you make the launch big enough to saturate the GPU the expected result for primary rays on the top end Turing boards is >10 GRays/second.
@droettger got it. Thanks again.