Strange results for Whitted tracer for GTX580 vs Titan

few weeks ago I have finished my master’s thesis (Architecture visualizer for distributed VR systems, basically I am using Optix for multiprojection visualisation like CAVE ) based on the Optix framework (3.01).

I have one question, because something strange happened when I was testing the results for different graphic cards (GTX 580 vs Titan).

The thesis can be seen here:
The results are in the Testing Chapter (Figure 4.1 and 4.5) you can notice that for low amount of rays (32x32,64x64) the GTX 580 is always faster (in FPS), do you have any idea why could this happened? It is kinda strange to me.

Thank you for any advice :).

Josef K.

The GPUs on these boards have dramatically different architectures. Many of them are explained in the CUDA programming topics on this forum.

The major difference in your case would be that the Kepler GPU on the Titan has a lot more streaming multiprocessors (max. 2880) than the Fermi GPU on the GTX580 (512). It excels at much higher load than you produce with 32x32 or 64x64 grids.

32x32 == 1024, means a GTX 580 with 512 SMs is loaded twice. For the 2688 or 2880 SMs on the Titan models a similar load would start at grid sizes of about 75x75. Your results for bigger grids prove that. The Titan is up to 40% faster there and that difference seems low.

Your scene hierarchy for a static scene is sub-optimal. I would expect a shallower hierarchy to produce better results.

:). Thank you, I think it makes sense to me.

Right now I am more interested in what you have said about the scene hierarchy. Could you please give me any direction how to improved it? Right now I am using LBVH structure for geometry nodes.

But your thesis said Sbvh.(?)
Of the provided acceleration structure builders, Lbvh is the fastest to build and slowest to render. It’s normally only used for dynamic geometry. It’s a bad choice for static geometry wrt. rendering performance.

If your scene is static there is no real need for a deep hierarchy with transform nodes.
You could for example remove transform nodes by pre-transforming all static geometry in your scene.

That’s obviously not advised if you use transforms to build instances. If you build instances with identical scene geometry underneath, share the acceleration structure among all GeometryGroups which have the same geometry underneath. (OptiX Programming Guide Chapter 3.5 Figure 2.) Saves a lot of memory and speeds up acceleration structure building.

I don’t know what your Whitted program looks like and how deep the recursion was, but I have seen higher geometry loads with better visuals than in your benchmark images at higher framerates in 2008.

Also the comment about reading the image from the GPU into a texture is hopefully not actually reading the data through the host, but using OpenGL interop and rendering into a shared Pixel-BufferObject which is staying on the GPU during the final texture blit.

Sbvh, sorry :). Ok I am gonna check the advices you gave me and work on it. Right now the test scenes were computed with the full specularity (it was activated everywhere) and diffuse rays + 1 shadow ray.

The texture reading is done through the host, I had the interop version but there are some troubles with the sharing the data since there is more GPU cards in 2 computers, but I will take a loot at it again.

Thank you :).