Shadow casting efficiency

Good morning,

Can anyone tell me how computationally efficient casting shadows with OptiX 7.4 is? I am trying to understand better how the new OptiX system renders shadows as opposed to something like rasterized shadows.

Thank you for your time.

Hi @picard1969,

Comparing ray tracing and rasterizing is pretty tricky, and highly dependent on the type of scene you have. This also depends heavily on what shadow rendering technique you are using with a rasterizer - I might assume you might be referring to shadow maps, but people sometimes use ray tracing for shadows, after rasterizing the primary rays into a g-buffer.

Speaking generally and waving hands furiously, for a lot of simple scenes, ray tracing might be a bit computationally more expensive than rasterizing. But for scenes with heavy instancing and tons of geometry & lights in the view, ray tracing can be considerably faster, both for camera rays and for shadows.

It’s a bit easier to reason about ray traced shadows compared to ray traced camera rays. As long as your shadows are opaque and not transparent, then you can take advantage of that fact by terminating rays whenever you hit anything at all, rather than exhaustively searching for the closest hit. You can do this with any ray tracer, and OptiX provides some API to enable it (see OPTIX_RAY_FLAG_TERMINATE_ON_FIRST_HIT). You can use this flag when casting a shadow ray, and generally speaking the shadow ray casting will be faster than without the flag. Opaque early terminating shadow rays that don’t require any shading are among the fastest types of rays you can trace using RTX hardware & OptiX.

Finally, perhaps the other way to evaluate the cost of shadow rays is to include it in your budget calculation for total rays. Each GPU model will have a rough maximum ray throughput that depends on the hardware and on what your application needs, e.g., how much memory bandwidth & compute you use. For Turing GPUs, Nvidia advertised 10 billion rays per second, which is achievable with relatively complex geometry but usually pretty simple shading. Ampere GPUs can achieve rates quite a bit higher than that, and earlier pre-RTX models top out quite a bit earlier. But you can mostly just assume that your shadow rays are part of your overall ray budget. If you cast ~2M rays from your camera at 1080p resolution, and hit something 50% of the time, then cast 1 shadow ray, then you will be tracing 3M rays per frame. If you need 30fps, then your budget is 3M* 30 = 90M rays per second. That would normally be trivial to achieve, but if you want to do the same thing with 100 samples per pixel, then you might want 9 billion rays per second. This is realistic with an Ampere GPU, but achieving that number in practice will still likely depend more on your closest-hit shading code than whether or not you cast shadow rays.


1 Like

Thank you @dhart for the response. It’s a lot of information to ingest but really good.

1 Like

Related thread on the implementation method for shadow rays using OPTIX_RAY_FLAG_TERMINATE_ON_FIRST_HIT here:

Explanations on what not to do with that flag here:

1 Like

Thanks @droettger for the links

This is a different question, but still relates to efficiency/performance. Can you tell me how much of a performance difference you get using Ray-Tracing without RTX cores versus with RTX cores? I know this will come in handy when employing some of the older generation DEVICES and would like to know a ballpark figure in the difference if that is possible.

Thank you again.

This is very application and scene dependent and cannot be answered with some single factor.
The RT cores on RTX boards are handling the BVH traversal of two-level acceleration structure graphs and intersections with the built-in triangle primitives in hardware.
Turing and Ampere have different RT core versions as well with Ampere being faster and esp. so for motion blur on triangles for example.

All device code you provide for the programmable OptiX domains will be executed on the streaming multiprocessors, so the newer and faster these are and the more you have of them, the better.
If you’re doing extremely complicated shading operations and a lot of memory accesses, the benefit of the RT cores will be reduced.
Means the more a program benefits from what is happening inside the RT cores, the higher the speedups.
It is possible to reach the >11 GRays/second advertised for high-end RTX boards on simple scenes with almost no shading. The binary decision for visibility rays discussed above is such a fast case.

There is an older comparison here: which shows that there can be a factor of 10 difference between Pascal and Turing boards for example. YMMV.
(Note that the mega-kernel execution strategy in that old OptiX 6 API used there has been removed in current display drivers.)

It’s definitely no fun working with non-RTX boards when you’ve experienced their raytracing performance once.

Great information. Thank you very much.

Last question, if that is okay.

Let’s say I simply want to determine if a ray is blocked (e.g. a shadow ray) - as primitive as possible. I think this is determined from within an RTX core(s) (if available). Am I correct in this assumption?


Yes, if you implement it the way described inside the code linked to in my first reply to this thread.
Look for the code which uses the combination of ray flags:

1 Like

Thanks. I thought so, just wanted to be sure

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.