Bad Performance with GTX 980 [resolved]

Hi,

I compared the performance of Laptop GT 750M (2GB) and Desktop GTX 980 (4GB)

For moderate amount of primitives, GTX 980 was expectedly but incredibly faster than GT 750M.

  1. Low Polygon Bunny, 157 vertices, 309 faces
    GT 750M, 600 [ms/sample]:
    https://drive.google.com/open?id=1M4JBunRxmGLU7nXKHuldJeyvcC1St0ho
    GTX 980, 70 [ms/sample]:
    https://drive.google.com/open?id=1Z0AUUwHHzajCwKwqdGY2wC0d2rFcglhB

  2. Sphere, 8066 vertices, 16128 faces
    GT 750M, 540 [ms/sample]:
    https://drive.google.com/open?id=1qSyahCFlOdc7Sl7gbtXdxsBA0j_cUl5F
    GTX 980, 100 [ms/sample]:
    https://drive.google.com/open?id=1_SaCzVax-jXItvlbPI-rFU88T9yVBZWi

However when rendering a model with higher amount of primitives, GTX 980 performance became quite bad.

  1. Substance Painter Man, 39679 vertices, 79082 faces
    GT 750M, 1000 [ms/sample]:
    https://drive.google.com/open?id=1jNQ-GP26s65vIMxw3D6SKWv2AV8BEmq_
    GTX 980, 8000 !!! [ms/sample]:
    https://drive.google.com/open?id=1igi5Z4m3rSxkuBZeQXs-xSUc2QilmOLV

Environment:
GT 750M: Windows 10 RS5, CPU: Core-i7 4850HQ 4 cores 2.3GHz, DDR3 16GB
GTX 980: Windows 10 RS4, CPU: Core-i7 5820K 6 cores 3.3GHz, DDR4 32GB
Both enviroments have OptiX 5.1.0

What is a cause? Is there some pitfall for Maxwell Architecture?

Thanks

The 750M is a first generation Maxwell GPU with low power for entry level, where the GTX 980 is near the top of the line consumer Maxwell board of the second generation.
The 980 should always beat the 750 no matter what. Ok, not in power savings. ;-)

I would say something is not right in your machine configuration.
Could you try updating the GTX 980 system to the newest available display driver version?
http://www.nvidia.com/Download/Find.aspx?lang=en-us

39679 vertices and 79082 faces is really small. With 4 GB memory you should be able to render scene in the millions of triangles.

Did you try enabling all exceptions and see if there is any error in the rays or stack overflows on either board?
Do not benchmark with exceptions enabled! Since OptiX 4 that incurs a huge performance penalty.
Is the performance scaling linearly with the launch size or does it fall off a cliff?
Can you limit the path length and try if this is because some paths get stuck?

For comparison you could also check the performance with one of my original OptiX introduction examples.
https://github.com/nvpro-samples/optix_advanced_samples

optixIntro_04 is suitable to check maximum performance of the display when zooming out, testing the OpenGL interop (disable VSYNC in the NVIDIA Control Panel under Manage 3D Settings!) and mostly primary rays. This should render in the hundreds of frames per second on the GTX 980 at default size.

Or for a more complex setup, use optixIntro_07 with the motion blur disabled would be a good test for base performance. The basic primitives are all run-time generated and can be tessellated to any complexity (except for the box which is always 12 triangles).
You could also build a Cornell Box with five of the planes and put other objects inside and a parallelogram area light at the roof with that framework easily.

I think GT 750M is Kepler Architecture not Maxwell,

  1. Update the driver to the newest: No effect
  2. Enable all exceptions: the program say nothing. But interestingly disabling stack overflow exception makes the program bit slower (~10%).
  3. I did a simple experiment where the program render only a subdivided quad while increasing the number of edges (1, 2, 4, 8, ..., 256). Camera is set in front of the quad with some distance. Ray generation program generates rays along perspective camera frustum and closest hit program do nothing. The ray generation program ends with the primary rays. Rendering resolution was 1280x720 and rays close to screen edges don't hit the quad. Scene hierarchy is as follows: Group-Transform-GeometryGroup-GeometryInstance-Geometry(Quad) Acceleration structures for Group and GeometryGroup are both "Trbvh" (with default settings). The result is https://docs.google.com/spreadsheets/d/1a8-GmTefK1KXt9ujSzhNHJ1C8297hGzbVycP_4rI8Ps/edit?usp=sharing For very little amount of primitives, GTX 980 runs faster than GT 750M but the order is reversed from some amount of primitives Why? some tuning is required for GTX 980?

I also wonder the performance curve of GT 750M.
Generally speaking, ray tracing performance with BVH follows logarithmic complexity but performance curves in the result seems like linear.

I finally found the cause! It was very basic bug for BVH…
My bounding box program was incorrect.

I apologize for causing trouble.

Now it became incredibly faster on GTX 980!