Hi @hkkzzxz24,
This an interesting question!
I don’t know the answer with any certainty, as it depends on your application’s specifics, but my guess/instinct is that OptiX should be able to help with both cases, with some assumptions and caveats. It’s true that OptiX is designed for large scenes and there are some overheads and design choices that were made for film production renderers that don’t necessarily represent the best you can do for small scenes or experimental rendering techniques.
For case 1 (small scenes), the triangle intersections are still hardware accelerated. The answer to this question depends on what kind of CUDA code you’re willing to write when not using OptiX. If the scene was small enough and simple enough, it’s plausible that you could save time by not using a BVH at all, and write the CUDA code in such a way that you hard-code “traversal” into your rendering algorithm - essentially hand-code your acceleration structure. People do tricks like this in ShaderToy shaders, for example. In that case, you might be able to make the CUDA code run faster simply by removing the BVH build from your workflow. That could be a very developer time-consuming rabbit hole though, and comes with serious limitations in what kind of scenes you can handle – so it’s possible, but it doesn’t scale.
Also for case 1, it depends on what kind of shading you need. If you have only 1 shader or no shaders, and shading is very simple, the hardware acceleration is almost guaranteed to be faster than a CUDA renderer, even with very tiny scenes.
With case 2 (many rays) - I would think this tends to favor OptiX because time spent ray tracing will dominate over small overheads. However, this may depend on what you do for case 1. If you found a way to make ray tracing faster for 1 ray, then it’s possible you could scale up and make tray tracing faster for many rays.
For what it’s worth, we have spoken with some AI researchers who asked similar questions, especially with very large numbers of small scenes, and sometimes the limiting factor is the BVH builds, and not the rendering at all. Creating a huge scene to encompass all the small scenes might not seem like a cost advantage when rendering, but you will coalesce all the overheads of launching many small BVH builds into a single BVH build and a single launch, and it’s likely to save a lot of time for that reason. If you can take advantage of the new cluster API, then you might be able to save even more time that way.
I don’t understand the final question/scenario about threads and rays, but I’ll ramble a little more and you can tell me if I’m not answering your question, okay? There’s no real startup overhead to tracing a ray or to starting a thread, really, other than the nanosecond-level overheads of using the RT cores. It’s fine to cast 1 ray per thread or to cast 40 or even thousands of rays per thread. The main thing you need to do performance-wise is to cast the same number of rays for each thread, at least within a warp. If you cast more rays in some threads, then the threads with fewer rays will need to wait for the threads in the warp that have more rays to finish, and that introduces code divergence in the warp, and leads to inefficiency.
The other thing to ensure is that when using RT cores, the work for a ray stays on the RT core until it hits it’s final destination. By this I mean for highest performance, and lowest overheads, avoid the use of custom intersection programs and anyhit programs. Use the built-in hardware triangles, and make sure to disable anyhit explicitly. Use the terminate-on-first-hit flag when you can, e.g., with shadow rays. If you have shaders that need to run during a ray’s traversal, that’s when RT core overheads can stack up.
You should also look at the SER API in OptiX. For simple scenes, you can call optixTraverse instead of optixTrace, and avoid having any shaders called at all. And if you have any divergence issues, you can consider using optixReorder to iron them out. It has it’s own overhead to balance, but for people that have had bad divergence problems, reorder has in some cases improved performance by multiples.
Of course, if there are things we can do to improve performance, we are interested to hear about them.
–
David.