What is the performance of the RTX 4090 measured in gigarays?

I saw in the white paper of the Turing architecture that the performance of the RTX 2080Ti could reach over 10 gigafarads.

1.With a performance improvement of 2 times for each generation, what is the rendering performance in gigarays of the RTX 4090 in the aforementioned scenarios?

2.Is there a repository for the benchmark code of the above Turing white paper? I want to measure on my 4090.

Hi @hkkzzxz24,

To my knowledge, there is no official benchmark application, nor are there published gigarays numbers since Turing. Since applications vary wildly in their peak performance limits, and their performance depends on a lot of application-specific choices, we recommend measuring gigarays per second of your own application, and using your application to compare multiple GPU models.


David.

But how do I know where the limit of my application is? Is there any data of giga rays in an ideal scenario? This way, I can bring my application closer to the ideal scenario. Currently, my application can only achieve a performance of 10+ gigarays. The light is of the ordinary kind, without any secondary light. How can I improve my performance? Do you have any suggestions?

It’s a very good question, but I might not have a satisfying answer - apologies. There is some theoretical hardware limit to the maximum number of rays per second that are theoretically possible on the RT cores of any given GPU, but I don’t know what that number is and no real applications can achieve it. This theoretical number would be higher than 10 gigarays/sec on Turing.

In practice, the actual performance limitations in ray tracing applications include memory bandwidth, instruction count during shading, divergence, the size and quality of your scene & models, choice of rendering algorithm, number of materials, scale of your scene, and many many other factors.

Do note that the marketing images you posted at the top don’t have any lighting, they are showing normals, and they have non-trivial amounts of background (miss) in the images. These are perfectly suitable renders & acceptable goals for some applications, but pay attention to whether your application has different goals that might cost more.

Maybe the best way to know what the upper limits of your application are is to measure your application with as many features disabled as possible, until you are only tracing rays and doing nothing else. I recommend trying the following:

  • Render a depth buffer - save only the hit t value. (Optional: don’t save anything, to avoid the memory traffic, but profile to make sure rays are being cast and not compiled out.)
  • Ensure your geometry is setup for fast rendering
    – Use only the built-in hardware triangle intersector, and no other geometry
    – Use OPTIX_GEOMETRY_FLAG_DISABLE_ANYHIT
    – Use OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_LEVEL_INSTANCING
    – Use OPTIX_BUILD_FLAG_PREFER_FAST_TRACE
    – Disable OPTIX_BUILD_FLAG_ALLOW_RANDOM_VERTEX_ACCESS
    – Ensure all models have welded vertices and not duplicated vertices or separated triangles
    – Ensure nothing in the scene is extremely far away from the camera
  • Disable ALL lighting & shading
    – Use a minimal 1-entry SBT with only raygen
    – Provide an empty miss program
    – Don’t provide a hit program in your pipeline
    – Optional: simulate shadow rays using OPTIX_RAY_FLAG_TERMINATE_ON_FIRST_HIT
  • In raygen, use optixTraverse rather than optixTrace (and don’t call optixInvoke - no shading)
  • Disable any other features you might have including OMMs, motion blur, multi-level scenes, denoising, color correction, etc. Disable everything.
  • Ensure your launch is big enough to saturate the GPU. Frames that cast less than, say, one million rays might be in danger of not saturating the GPU. It may be better to render a large image and do multi-sampling with (for example) 100 samples per pixel to get a sense of peak ray throughput.
  • Use the simplest possible math for raygen samples, avoid jittering. Do as little as possible before and after casting a ray.
  • Use compile time constants, or OptiX parameter specialization (bound values), for anything you can.
  • Minimize all memory I/O in raygen, including launch params, local (stack) and global memory.
  • Be careful with measurements, and use stream events for timing. Don’t use any atomics or synchronization in between timing events.
  • Measure performance on a GPU that is not connected to a monitor.
  • Obviously, disable validation mode, and ensure your compilation is Release mode (not Debug) with all optimizations turned on.

Measuring this will tell you how fast your application runs on your scene data & camera. From there you can start re-enabling features one at a time and profiling along the way and see if you believe each feature is compromising your performance more than you expect.


David.