ProtoRay - OptiX 6.0 Prime perf regression using Turing

Hi,

After compiling ProtoRay for OptiX 6.0 the performance went downhill with an RTX 2080 Ti, I initially thought the app hanged. Nevertheless the final image was OK. There was no issue with a GTX 1080 Ti using OptiX 6.0 and the RTX card was OK using the older OptiX 5.1.

https://github.com/Woking-34/embree-benchmark-protoray
The search paths are set for OptiX 5.1.1, some manual FindOptix modification is required to target OptiX 6.0:
- uncomment: 83, 116-124
- remove lines: 82, 107-114

Any help is appreciated, thanks!

Hello,

Are you using OptiXs Prime? This post might shed some light on the matter.

[url]https://devtalk.nvidia.com/default/topic/1039223/optix/-optix-optix-prime-compatibility-with-cpu-and-rtx/post/5313797/#5313797[/url]

Yes, ProtoRay uses Optix Prime as the intersection backend. The situation:

  • with a GTX 1080Ti using OptiX 5.1 or OptiX 6.0 is OK, there is even a slight performance improvement.
  • with a RTX 2080Ti using OptiX 5.1 is fine, nice perf delta between Pascal and Turing.
  • with a RTX 2080Ti using OptiX 6.0. there is a huge regression.

Sorry for the long log files but this might be the most informative, please look for the mray measurements. The app prints the OptiX version as well.

GTX 1080Ti

[13:35:44] Command line: protoray.exe render ../../benchmark/sanmiguel/sanmiguel.mesh -no-mtl -r diffuse -size 3840,2160 -dev cuda -a optix -spp 64
[13:35:44] CPU: Unknown
[13:35:44] CPU threads: 12
[13:35:44] CPU SIMD width: 8
[13:35:44] Device: GeForce GTX 1080 Ti
[13:35:44] Loading blob: ../../benchmark/sanmiguel/sanmiguel.mesh
[13:35:45] Acceleration structure: optix
[13:35:45] Creating OptiX Prime context
[13:35:45] OptiX Version:[5.1.1] Branch:[rel5.1] Build Number:[25109142] CUDA Version:[9.0] 64-bit 2018-10-19
[13:35:45] Building acceleration structure
[13:35:45] Build time: 90.565 ms
[13:35:45] Build speed: 115.944 Mprim/s
[13:35:45] Sampler: random
[13:35:45] Resolution: 3840x2160
[13:35:45] Offscreen mode
[13:35:45] Loading view set: ../../benchmark/sanmiguel/sanmiguel.view
[13:35:45] Active view: 0
[13:35:45] Loading POI set: ../../benchmark/sanmiguel/sanmiguel.poi
[13:35:45] Render loop
[13:35:46] renderMs=482.241 fps=2.074 mray=115.217 ray=55492825 spp=3
[13:35:48] renderMs=480.414 fps=2.082 mray=115.645 ray=55489033 spp=6
[13:35:49] renderMs=482.197 fps=2.074 mray=115.215 ray=55487580 spp=9
[13:35:51] renderMs=482.375 fps=2.073 mray=115.182 ray=55491964 spp=12
[13:35:52] renderMs=481.330 fps=2.078 mray=115.443 ray=55497708 spp=15
[13:35:54] renderMs=482.253 fps=2.074 mray=115.216 ray=55494628 spp=18
[13:35:55] renderMs=481.289 fps=2.078 mray=115.457 ray=55498469 spp=21
[13:35:57] renderMs=482.343 fps=2.073 mray=115.185 ray=55491038 spp=24
[13:35:58] renderMs=482.331 fps=2.073 mray=115.193 ray=55493231 spp=27
[13:35:59] renderMs=482.064 fps=2.074 mray=115.261 ray=55493461 spp=30
[13:36:01] renderMs=482.282 fps=2.073 mray=115.210 ray=55492750 spp=33
[13:36:02] renderMs=482.289 fps=2.073 mray=115.202 ray=55492751 spp=36
[13:36:04] renderMs=481.740 fps=2.076 mray=115.331 ray=55488798 spp=39
[13:36:05] renderMs=482.265 fps=2.074 mray=115.208 ray=55492556 spp=42
[13:36:07] renderMs=481.632 fps=2.076 mray=115.354 ray=55489825 spp=45
[13:36:08] renderMs=482.065 fps=2.074 mray=115.255 ray=55491157 spp=48
[13:36:10] renderMs=482.353 fps=2.073 mray=115.208 ray=55502002 spp=51
[13:36:11] renderMs=481.306 fps=2.078 mray=115.436 ray=55492106 spp=54
[13:36:12] renderMs=482.275 fps=2.073 mray=115.205 ray=55492461 spp=57
[13:36:14] renderMs=484.252 fps=2.065 mray=114.741 ray=55493689 spp=60
[13:36:15] renderMs=481.642 fps=2.076 mray=115.360 ray=55494201 spp=63
[13:36:16] renderMs=483.037 fps=2.069 mray=115.040 ray=55494119 spp=64
[13:36:16] Average: buildMs=90.565 buildMprim=115.944 renderMs=482.230 fps=2.073 mray=115.222 ray=55492704.484

[13:36:21] Command line: protoray.exe render ../../benchmark/sanmiguel/sanmiguel.mesh -no-mtl -r diffuse -size 3840,2160 -dev cuda -a optix -spp 64
[13:36:21] CPU: Unknown
[13:36:21] CPU threads: 12
[13:36:21] CPU SIMD width: 8
[13:36:21] Device: GeForce GTX 1080 Ti
[13:36:21] Loading blob: ../../benchmark/sanmiguel/sanmiguel.mesh
[13:36:22] Acceleration structure: optix
[13:36:22] Creating OptiX Prime context
[13:36:22] OptiX Version:[6.0.0] Branch:[rel6.0] Build Number:[25650775] CUDA Version:[10.1] 64-bit 2019-01-29
[13:36:22] Building acceleration structure
[13:36:22] Build time: 86.627 ms
[13:36:22] Build speed: 121.214 Mprim/s
[13:36:22] Sampler: random
[13:36:22] Resolution: 3840x2160
[13:36:22] Offscreen mode
[13:36:22] Loading view set: ../../benchmark/sanmiguel/sanmiguel.view
[13:36:22] Active view: 0
[13:36:22] Loading POI set: ../../benchmark/sanmiguel/sanmiguel.poi
[13:36:22] Render loop
[13:36:23] renderMs=454.797 fps=2.199 mray=122.178 ray=55492802 spp=3
[13:36:24] renderMs=453.744 fps=2.204 mray=122.458 ray=55489057 spp=6
[13:36:26] renderMs=454.539 fps=2.200 mray=122.235 ray=55487591 spp=9
[13:36:27] renderMs=454.021 fps=2.203 mray=122.385 ray=55491922 spp=12
[13:36:29] renderMs=453.655 fps=2.204 mray=122.494 ray=55497703 spp=15
[13:36:30] renderMs=454.727 fps=2.199 mray=122.199 ray=55494630 spp=18
[13:36:31] renderMs=453.957 fps=2.203 mray=122.416 ray=55498449 spp=21
[13:36:33] renderMs=454.747 fps=2.199 mray=122.190 ray=55491058 spp=24
[13:36:34] renderMs=454.664 fps=2.199 mray=122.213 ray=55493257 spp=27
[13:36:35] renderMs=454.555 fps=2.200 mray=122.242 ray=55493488 spp=30
[13:36:37] renderMs=453.666 fps=2.204 mray=122.480 ray=55492740 spp=33
[13:36:38] renderMs=454.631 fps=2.200 mray=122.219 ray=55492752 spp=36
[13:36:39] renderMs=454.519 fps=2.200 mray=122.244 ray=55488763 spp=39
[13:36:41] renderMs=454.608 fps=2.200 mray=122.227 ray=55492541 spp=42
[13:36:42] renderMs=454.459 fps=2.200 mray=122.260 ray=55489796 spp=45
[13:36:44] renderMs=454.732 fps=2.199 mray=122.189 ray=55491115 spp=48
[13:36:45] renderMs=462.176 fps=2.164 mray=120.250 ray=55502013 spp=51
[13:36:46] renderMs=458.294 fps=2.182 mray=121.244 ray=55492069 spp=54
[13:36:48] renderMs=465.477 fps=2.148 mray=119.370 ray=55492442 spp=57
[13:36:49] renderMs=454.521 fps=2.200 mray=122.253 ray=55493698 spp=60
[13:36:50] renderMs=453.921 fps=2.203 mray=122.414 ray=55494213 spp=63
[13:36:51] renderMs=457.039 fps=2.186 mray=121.584 ray=55494132 spp=64
[13:36:51] Average: buildMs=86.627 buildMprim=121.214 renderMs=455.419 fps=2.195 mray=122.016 ray=55492704.125

RTX 2080Ti

[13:50:03] Command line: protoray.exe render ../../benchmark/sanmiguel/sanmiguel.mesh -no-mtl -r diffuse -size 3840,2160 -dev cuda -a optix -spp 64
[13:50:03] CPU: Unknown
[13:50:03] CPU threads: 8
[13:50:03] CPU SIMD width: 8
[13:50:03] Device: GeForce RTX 2080 Ti
[13:50:03] Loading blob: ../../benchmark/sanmiguel/sanmiguel.mesh
[13:50:03] Acceleration structure: optix
[13:50:03] Creating OptiX Prime context
[13:50:04] OptiX Version:[5.1.1] Branch:[rel5.1] Build Number:[25109142] CUDA Version:[9.0] 64-bit 2018-10-19
[13:50:04] Building acceleration structure
[13:50:04] Build time: 60.801 ms
[13:50:04] Build speed: 172.703 Mprim/s
[13:50:04] Sampler: random
[13:50:04] Resolution: 3840x2160
[13:50:04] Offscreen mode
[13:50:04] Loading view set: ../../benchmark/sanmiguel/sanmiguel.view
[13:50:04] Active view: 0
[13:50:04] Loading POI set: ../../benchmark/sanmiguel/sanmiguel.poi
[13:50:04] Render loop
[13:50:05] renderMs=269.192 fps=3.715 mray=206.478 ray=55491897 spp=4
[13:50:06] renderMs=268.989 fps=3.718 mray=206.630 ray=55490202 spp=8
[13:50:07] renderMs=269.205 fps=3.715 mray=206.476 ray=55491929 spp=12
[13:50:08] renderMs=269.131 fps=3.716 mray=206.519 ray=55488868 spp=16
[13:50:09] renderMs=269.216 fps=3.714 mray=206.476 ray=55493104 spp=20
[13:50:10] renderMs=269.515 fps=3.710 mray=206.235 ray=55491031 spp=24
[13:50:11] renderMs=269.152 fps=3.715 mray=206.517 ray=55494131 spp=28
[13:50:12] renderMs=269.378 fps=3.712 mray=206.349 ray=55499341 spp=32
[13:50:13] renderMs=269.359 fps=3.712 mray=206.366 ray=55492733 spp=36
[13:50:15] renderMs=269.160 fps=3.715 mray=206.510 ray=55491709 spp=40
[13:50:16] renderMs=269.381 fps=3.712 mray=206.334 ray=55490578 spp=44
[13:50:17] renderMs=270.662 fps=3.695 mray=205.359 ray=55491155 spp=48
[13:50:18] renderMs=270.464 fps=3.697 mray=205.513 ray=55492824 spp=52
[13:50:19] renderMs=270.397 fps=3.698 mray=205.574 ray=55495092 spp=56
[13:50:20] renderMs=270.523 fps=3.697 mray=205.475 ray=55493695 spp=60
[13:50:21] renderMs=270.488 fps=3.697 mray=205.502 ray=55494107 spp=64
[13:50:21] Average: buildMs=60.801 buildMprim=172.703 renderMs=270.417 fps=3.697 mray=205.578 ray=55492705.750

[13:55:03] Command line: protoray.exe render ../../benchmark/sanmiguel/sanmiguel.mesh -no-mtl -r diffuse -size 3840,2160 -dev cuda -a optix -spp 64
[13:55:03] CPU: Unknown
[13:55:03] CPU threads: 8
[13:55:03] CPU SIMD width: 8
[13:55:03] Device: GeForce RTX 2080 Ti
[13:55:03] Loading blob: ../../benchmark/sanmiguel/sanmiguel.mesh
[13:55:03] Acceleration structure: optix
[13:55:03] Creating OptiX Prime context
[13:55:03] OptiX Version:[6.0.0] Branch:[rel6.0] Build Number:[25650775] CUDA Version:[10.1] 64-bit 2019-01-29
[13:55:03] Building acceleration structure
[13:55:03] Build time: 58.766 ms
[13:55:03] Build speed: 178.684 Mprim/s
[13:55:03] Sampler: random
[13:55:03] Resolution: 3840x2160
[13:55:03] Offscreen mode
[13:55:03] Loading view set: ../../benchmark/sanmiguel/sanmiguel.view
[13:55:03] Active view: 0
[13:55:03] Loading POI set: ../../benchmark/sanmiguel/sanmiguel.poi
[13:55:03] Render loop
[13:55:33] renderMs=29421.665 fps=0.034 mray=1.886 ray=55489201 spp=1
[13:56:03] renderMs=30780.382 fps=0.032 mray=1.803 ray=55491560 spp=2
[13:56:33] renderMs=29667.480 fps=0.034 mray=1.870 ray=55492807 spp=3
[13:57:03] renderMs=30094.653 fps=0.033 mray=1.844 ray=55491886 spp=4
[13:57:33] renderMs=29910.275 fps=0.033 mray=1.855 ray=55491549 spp=5
[13:58:03] renderMs=29938.626 fps=0.033 mray=1.853 ray=55489056 spp=6
[13:58:33] renderMs=29935.047 fps=0.033 mray=1.854 ray=55490087 spp=7
[13:59:03] renderMs=30102.698 fps=0.033 mray=1.843 ray=55490213 spp=8
[13:59:33] renderMs=29943.096 fps=0.033 mray=1.853 ray=55487574 spp=9
[13:59:49] Interrupted

Thanks!

I have forwarded your log files to one of the OptiX experts for a look. Stay tuned…

Thanks,
Tom

Hi @m_nyers,

Unfortunately, we have no plans to support RTX in OptiX Prime. I don’t know why it regressed for you (some people reported minor speedups with Prime.) Today I will take a quick look at your log files and also ask around internally about whether OptiX Prime regressions will be supported, however just to be very honest and clear: your best course of action is to port your renderer’s OptiX Prime back-end over to OptiX. I’m sure we could take a look at the code and offer some guidance about how to do that, if you would like. I just skimmed your repo yesterday very briefly, and it didn’t look like very much code would need to change, perhaps only a few files? You would rather have your 2080 Ti provide the multiples in acceleration that others are seeing than have it be the same or only slightly faster than your 1080, yes?


David.

Thanks for your response! Yes, the aim is to get full RTX support so OptiX Prime is not an option. My plan is to port the code to use the new geometrytriangles interface. I just wanted to file a bug report that could be helpful to others or maybe get fixed.

// btw Pascal has a small speedup, only Turing suffers :-)