RTX ON/OFF Benchmark, Optix 6

Details:

  • Source: Modified Blender Classroom Scene. Link to OBJ
  • Triangles: 6 500 000
  • Objects: 2000
  • Lights: 1 (Rectangular area light)
  • Max Path Depth: 6
  • Samples per frame: 2x2 (4)
  • Resolution: 1920x1080
  • Optix Version: 6.0.0
  • Drivers Version: 418.91
  • GPU: RTX 2080 TI
  • Denoiser: No

Test Results

RTX Off

  • Average FPS 1.23
  • Sampling Rate: 10.2 Mpx/s

RTX On

  • Average FPS 5.28
  • Sampling Rate: 43.79 Mpx/s

Summary

With RTX I go 4.30x increase in performance.
With higher scene complexity I was able to hit 4.5x and 5x increase in performance (with modified sampling and depth values).
For Scenes around 300k - 500k triangles i get around 3.5x - 4x increase.
Simply… AMAZING!

Variant with lower resolution but higher per pixel sampling during single frame.

Variant Details:

  • Samples per frame: 4x4 (16)
  • Resolution: 1280x720

Test Results

RTX Off

  • Average FPS 0.65
  • Sampling Rate: 9.58 Mpx/s

RTX On

  • Average FPS 2.95
  • Sampling Rate: 43.49 Mpx/s

Summary

In this variant with higher per frame sampling rate I got 4.53x increase with RTX On.

Variant with lower resolution, higher depth and higher per pixel sampling during single frame.

Variant Details:

  • Samples per frame: 5x5 (25)
  • Max Path Depth: 8
  • Resolution: 1024x576

Test Results

RTX Off

  • Average FPS 0.46
  • Sampling Rate: 6.78 Mpx/s

RTX On

  • Average FPS 2.28
  • Sampling Rate: 33.61 Mpx/s

Summary

In this variant with higher per frame sampling rate I got 4.96x increase with RTX On.
Almost 5.0x increase!

It looks like RT Cores improvements scale with scene and path size (max depth). Sampling per frame also seems to have positive impact (or at least it’s better handled with RTX one then off).

Impressive results!

Is this naive path tracing? And Can I interpret “samples per frame” as “samples per pixel”?
If so, I can calculate the numbers in units of [rays/s] as follows:
RTX off result in the your last post:
(assuming average path depth is 4)
6.78[Mpx/s] * 25[samples/px] * 4[rays/sample] = 678[Mrays/s]

Is this calculation correct?

It’s native path tracing from Nvidia samples. Not much added then dropped russian roulette for early path termination and used static depth limit.

Max ray calculation would look like this IMO.

RTX ON:
1024[screen-x] * 576[screen-y] * 25[samples/frame] * 8[depth] * 2,28[fps] = 264[Mrays/s]

RTX OFF:
1024[screen-x] * 576[screen-y] * 25[samples/frame] * 8[depth] * 0,46[fps] = 54[Mrays/s]

It’s far away from 10 giga rays promise ;) I assume that Optix has quite heavy abstraction layer and promised value itself as always was a an marketing point ie. RT Cores can push 10 gigarays if there is no shading and one big triangle in a scene.

I’ve found some interesting discussion over Gigarays and what people were able to achieve:
https://www.reddit.com/r/nvidia/comments/9jfjen/10_gigarays_translate_to_32_gigarays_in_real/

However this seams a little bit strange. As performance increase in my case grows with scene complexity.
Simple scenes like Cornel Box bring only around 1.2x - 1.4x improvement. Which goes a little bit against some points saying that simple scenes allow to hit values close to advertised number.

Would also love if someone from Nvidia could contribute with some of their benchmark numbers for Optix ie. maximum rays/s output they achieved.

One more thing. Otoy recently released their benchmark preview with RTX, with improvement of 2.7x for PathTracing.
https://render.otoy.com/forum/ucp.php?i=pm&f=-2&p=40293

Ok, I’ve implemented proper ray counting logic (counts amount of casted traces inside program per each launch).

RTX On:
33.40 MPx/s
235.90 Mrays/s

RTX Off:
6.82 MPx/s
48.20 Mrays/s

Interesting fact. I’ve used the same scene with similar lighting scenario on Otoy Octane with V100 GPU,
and it performed at 10 MPx/s.

New record. After removing light falloff to simulate sun direct light, i got 5.25x performance improvement over no RTX.

RTX On:
36.21 MPx/s
255.75 Mrays/s

Next steps: Check how 1080 performs against same code and RTX enabled.

RTX Comparison between 1080 and RTX 2080 TI.

Scenario:

  • Samples per frame: 5x5 (25)
  • Max Path Depth: 8
  • Resolution: 1024x576
  • Rtx: On

Results

GTX 1080:
3.73 MPx/s
26.35 Mrays/s

RTX 2080 TI:
36.21 MPx/s
255.75 Mrays/s

Summary

RTX 2080 TI in this scenario is 9.7x Faster then GTX 1080!

Damn. I calculated rays wrongly as i didn’t include ones for direct lighting.

Below updated results. Performance difference ratio against RTX OFF and older GPU’s should be the same

Scenario:
Samples per frame: 5x5 (25)
Max Path Depth: 8
Resolution: 1024x576
Rtx: On

RTX 2080 TI:
36.21 MPx/s
440 Mrays/s

One more benchmark. This time synthetic test.

Just direct lighting, 8 bounces + direct lighting (so 16 traces per pixel) without miss.

Scenario:

  • Resolution: 2559x1381
  • Scene: 12 triangles
  • Light: 1 direct area light
  • Depth: 8
  • Samples: 5x5 (25)

Result

RTX ON

1376.98 Mspx/s
11 GRays/s!

RTX OFF

9.1x slower then RTX ON! (RTX OFF pushes around 1.2 GRays/s)

This is in effect benchmarking trace execution alone against minimal scene size.
Looks like their 10 Giga ray promise was met. Now, major fight for performance increase will
be limiting cost of shader execution on hits.

Screenshot from a test with stats at the bottom:

Very impressive results! But I can’t help noticing that your example scene has only diffuse shading.

I would be very interested to see the speedup when using the RTX cores on the 2080Ti with some of the scenes provided by this Optix based renderer: https://github.com/knightcrawler25/Optix-PathTracer as the included scenes have reasonably complex shading (Disney BRDF) and lighting.

Sure, I can check it. However amount of improvement will decrees with increasing shading complexity.
So if Shading will consume 50% of time to generate single sample, I guess max increase will go around 2-2.5x (Something similar to what octane offers). That’s why if shading cost is close to zero (ie. just pure read of texture or normal), We can get 10x increase in performance. However when we start to even use Lambert shading, We go down to 4x. No idea how performant Disney BRDF will be, but i guess it will be heavier then Lambert.

My guess is that currently mostly lightmappers outputing GI pass will see max performance increase for pathtracing scenario, which I’m quite happy about as it’s also my use case. Ambient occlusion renderers should also benefit a lot. And for complex production scenes with fancy and costly shaders, I would assume averaged 2.5x increase over no RTX.

Actually, it would be nice to get performance results splitted between time spent on tracing vs shading/cuda calculations. If anyone have some suggestion how to do it without running Nvidia insight profiler I would be very interested. One ways i can think of is just run two passes. First for simple GI, and second for shading and in the end we could compare how much was gained within given pass.

I assumed as much (less speedup with increased shading complexity). Looking forward to more of your benchmarks!

Very interesting. Thanks for sharing.

Hi, I had a simple comparison of RTX on/off using own renderer.

https://drive.google.com/open?id=1mJcfcLQNNXIciKUD1m2OcKeXmdGdAn-g

I measured at 3 viewpoints in Amazon Lumberyard Bistro scene: https://casual-effects.com/data/
It has diffuse/specular color, glossiness, normal, alpha textures, so the renderer uses all the textures from Frostbite-like BRDF (normalized burley diffuse and GGX microfacet specular).
My renderer has (runtime) shader node feature for flexible material evaluation, so it heavily depends on callable programs.
Therefore BRDF evaluation is heavier than one’s expectation a bit.

Viewport 1:


RTX ON / OFF: 290[ms] / 1000[ms]

Viewport 2:


RTX ON / OFF: 300[ms] / 1130[ms]

Viewport 3:


RTX ON / OFF: 290[ms] / 830[ms]

Video (RTX ON)

My PC employ RTX 2070.

I got around 3x perrformance boost at this scene.
Perfomance ratio slightly drops at the viewport 3. The reason might be that this viewport shows a plant at which the renderer needs to evaluate alpha texture many times, therefore task other than traversing BVH gets larger.

3x performance boost at a scene using moderately complex material seems sufficient in my opinion.

By the way this scene uses over 5GB VRAM.
I need to try using block-compressed textures that has been supported since OptiX 6.0.

Thanks for sharing! 3x is amazing result for such complex scene.

PS. Could you make those viewport images public? Currently they are restricted.

Oops, I fixed.
Thanks for notifying :D
Can you access the files?

Works great now. Thanks!