With RTX I go 4.30x increase in performance.
With higher scene complexity I was able to hit 4.5x and 5x increase in performance (with modified sampling and depth values).
For Scenes around 300k - 500k triangles i get around 3.5x - 4x increase. Simply… AMAZING!
In this variant with higher per frame sampling rate I got 4.96x increase with RTX On.
Almost 5.0x increase!
It looks like RT Cores improvements scale with scene and path size (max depth). Sampling per frame also seems to have positive impact (or at least it’s better handled with RTX one then off).
Is this naive path tracing? And Can I interpret “samples per frame” as “samples per pixel”?
If so, I can calculate the numbers in units of [rays/s] as follows:
RTX off result in the your last post:
(assuming average path depth is 4)
6.78[Mpx/s] * 25[samples/px] * 4[rays/sample] = 678[Mrays/s]
It’s far away from 10 giga rays promise ;) I assume that Optix has quite heavy abstraction layer and promised value itself as always was a an marketing point ie. RT Cores can push 10 gigarays if there is no shading and one big triangle in a scene.
I’ve found some interesting discussion over Gigarays and what people were able to achieve:
However this seams a little bit strange. As performance increase in my case grows with scene complexity.
Simple scenes like Cornel Box bring only around 1.2x - 1.4x improvement. Which goes a little bit against some points saying that simple scenes allow to hit values close to advertised number.
Would also love if someone from Nvidia could contribute with some of their benchmark numbers for Optix ie. maximum rays/s output they achieved.
One more thing. Otoy recently released their benchmark preview with RTX, with improvement of 2.7x for PathTracing.
Just direct lighting, 8 bounces + direct lighting (so 16 traces per pixel) without miss.
Scenario:
Resolution: 2559x1381
Scene: 12 triangles
Light: 1 direct area light
Depth: 8
Samples: 5x5 (25)
Result
RTX ON
1376.98 Mspx/s 11 GRays/s!
RTX OFF
9.1x slower then RTX ON! (RTX OFF pushes around 1.2 GRays/s)
This is in effect benchmarking trace execution alone against minimal scene size.
Looks like their 10 Giga ray promise was met. Now, major fight for performance increase will
be limiting cost of shader execution on hits.
Screenshot from a test with stats at the bottom: External Media
Sure, I can check it. However amount of improvement will decrees with increasing shading complexity.
So if Shading will consume 50% of time to generate single sample, I guess max increase will go around 2-2.5x (Something similar to what octane offers). That’s why if shading cost is close to zero (ie. just pure read of texture or normal), We can get 10x increase in performance. However when we start to even use Lambert shading, We go down to 4x. No idea how performant Disney BRDF will be, but i guess it will be heavier then Lambert.
My guess is that currently mostly lightmappers outputing GI pass will see max performance increase for pathtracing scenario, which I’m quite happy about as it’s also my use case. Ambient occlusion renderers should also benefit a lot. And for complex production scenes with fancy and costly shaders, I would assume averaged 2.5x increase over no RTX.
Actually, it would be nice to get performance results splitted between time spent on tracing vs shading/cuda calculations. If anyone have some suggestion how to do it without running Nvidia insight profiler I would be very interested. One ways i can think of is just run two passes. First for simple GI, and second for shading and in the end we could compare how much was gained within given pass.
I measured at 3 viewpoints in Amazon Lumberyard Bistro scene: [url]https://casual-effects.com/data/[/url]
It has diffuse/specular color, glossiness, normal, alpha textures, so the renderer uses all the textures from Frostbite-like BRDF (normalized burley diffuse and GGX microfacet specular).
My renderer has (runtime) shader node feature for flexible material evaluation, so it heavily depends on callable programs.
Therefore BRDF evaluation is heavier than one’s expectation a bit.
Viewport 1:
RTX ON / OFF: 290[ms] / 1000[ms]
Viewport 2:
RTX ON / OFF: 300[ms] / 1130[ms]
Viewport 3:
RTX ON / OFF: 290[ms] / 830[ms]
Video (RTX ON)
My PC employ RTX 2070.
I got around 3x perrformance boost at this scene.
Perfomance ratio slightly drops at the viewport 3. The reason might be that this viewport shows a plant at which the renderer needs to evaluate alpha texture many times, therefore task other than traversing BVH gets larger.
3x performance boost at a scene using moderately complex material seems sufficient in my opinion.
By the way this scene uses over 5GB VRAM.
I need to try using block-compressed textures that has been supported since OptiX 6.0.