Strange results for Whitted tracer for GTX580 vs Titan

misteryyy · May 27, 2014, 8:16am

Hi,
few weeks ago I have finished my master’s thesis (Architecture visualizer for distributed VR systems, basically I am using Optix for multiprojection visualisation like CAVE ) based on the Optix framework (3.01).

I have one question, because something strange happened when I was testing the results for different graphic cards (GTX 580 vs Titan).

The thesis can be seen here: https://dip.felk.cvut.cz/browse/pdfcache/kortajos_2014dipl.pdf
The results are in the Testing Chapter (Figure 4.1 and 4.5) you can notice that for low amount of rays (32x32,64x64) the GTX 580 is always faster (in FPS), do you have any idea why could this happened? It is kinda strange to me.

Thank you for any advice :).

Josef K.

droettger · May 27, 2014, 9:07am

The GPUs on these boards have dramatically different architectures. Many of them are explained in the CUDA programming topics on this forum.

The major difference in your case would be that the Kepler GPU on the Titan has a lot more streaming multiprocessors (max. 2880) than the Fermi GPU on the GTX580 (512). It excels at much higher load than you produce with 32x32 or 64x64 grids.

32x32 == 1024, means a GTX 580 with 512 SMs is loaded twice. For the 2688 or 2880 SMs on the Titan models a similar load would start at grid sizes of about 75x75. Your results for bigger grids prove that. The Titan is up to 40% faster there and that difference seems low.

Your scene hierarchy for a static scene is sub-optimal. I would expect a shallower hierarchy to produce better results.

misteryyy · May 27, 2014, 9:48am

:). Thank you, I think it makes sense to me.

Right now I am more interested in what you have said about the scene hierarchy. Could you please give me any direction how to improved it? Right now I am using LBVH structure for geometry nodes.

droettger · May 27, 2014, 10:29am

But your thesis said Sbvh.(?)
Of the provided acceleration structure builders, Lbvh is the fastest to build and slowest to render. It’s normally only used for dynamic geometry. It’s a bad choice for static geometry wrt. rendering performance.

If your scene is static there is no real need for a deep hierarchy with transform nodes.
You could for example remove transform nodes by pre-transforming all static geometry in your scene.

That’s obviously not advised if you use transforms to build instances. If you build instances with identical scene geometry underneath, share the acceleration structure among all GeometryGroups which have the same geometry underneath. (OptiX Programming Guide Chapter 3.5 Figure 2.) Saves a lot of memory and speeds up acceleration structure building.

I don’t know what your Whitted program looks like and how deep the recursion was, but I have seen higher geometry loads with better visuals than in your benchmark images at higher framerates in 2008.
[url]optix history - OptiX - NVIDIA Developer Forums

Also the comment about reading the image from the GPU into a texture is hopefully not actually reading the data through the host, but using OpenGL interop and rendering into a shared Pixel-BufferObject which is staying on the GPU during the final texture blit.

misteryyy · May 27, 2014, 10:48am

Sbvh, sorry :). Ok I am gonna check the advices you gave me and work on it. Right now the test scenes were computed with the full specularity (it was activated everywhere) and diffuse rays + 1 shadow ray.

The texture reading is done through the host, I had the interop version but there are some troubles with the sharing the data since there is more GPU cards in 2 computers, but I will take a loot at it again.

Thank you :).

Topic		Replies	Views
on Optix6.0 my 2080ti is 4X faster than titanV OptiX	7	1206	June 14, 2022
Optix-low computational usage on GPU OptiX	12	941	June 22, 2022
Optix 6.5 Demo Performance Concern OptiX hw , cuda	6	1542	October 12, 2021
Multi GPU OptiX	7	3134	June 14, 2022
Bad optix ray-shooting performance. OptiX	8	1451	June 14, 2022
Bad Performance with GTX 980 [resolved] OptiX	5	1346	August 23, 2018
memory usage in multi GPU system (NVLink) Linux OptiX	6	1524	June 14, 2022
Tesla k20 performance? OptiX	5	2343	June 14, 2022
Multi-GPU with several float buffers OptiX	5	1306	June 14, 2022
RTX ON/OFF Benchmark, Optix 6 OptiX	18	4423	June 14, 2022

Strange results for Whitted tracer for GTX580 vs Titan

Related topics