Bug report: GLSL perf: dead code unrecognized + unrolled code 25 times more costly than loop

Original bugreport: https://bugs.chromium.org/p/chromium/issues/detail?id=922029#c6
Original author: fabrice.neyret@gmail.com

Original content:

Example URL:
https://www.shadertoy.com/view/3slGWS

Steps to reproduce the problem:
See url.

  • unrecognized dead code:
  • call a costly function many times, don’t use the result.
  • See perfs. Compare perfs with calling the function in a loop.
  • unrolling vs loop perf:
  • Call a costly function many times. Compare perfs with calling the function in a loop.

NB: if you have a good GPU, you should duplicated unrolled lines and multiply loop accordingly to aim 30 fps.

What is the expected behavior?

  • max fps when no result is used. ( final fragColor = vec4(0) ).
  • about the same perf for loop and manually unrolled version (at least not a perf ratio of 25 ).

What went wrong?

  • dead code not recognized:
  • when replacing shader end by fragColor = vec4(0) the cost keeps mostly the same (while get to 60fps with the loop).
  • perf difference with manual unrolling:
    on my machine, 30 explicit calls to func cost as much as a loop of 800 (func does depend of loop invariant ).

Did this work before? Yes NVIDIA driver 384.130

Note that it is probably an NVIDIA driver bug, possibly due to the new Nvidia GLSL/SPIR-V compiler :

linux:
I don’t have the bug on linux/nvidia with driver 384.130
I have the bug on linux/nvidia with driver 396.54
I have the bug on linux/nvidia with driver 390.77
I have the bug on linux/nvidia with driver 410.78

windows:
I don’t have the bug on windows/nividia with both trueOpenGL and Angle mode with driver 375.86
I have the bug on windows/nvidia trueOpenGL Angle=off with the ultra-last driver.

kkinnunen@nvidia.com additions:

  • Shadertoy version reproes with Windows 10, 416.34, GTX 1080, Chrome/73.0.3672.1, --use-angle=gl
  • Repro observations from conf above: observing with Task Manager, loop version has GPU utilization of 85%, non-loop version has the GPU utilizaton of 100%

Next steps for this:

  1. Create a native repro
  2. Observe the problem
  3. File an internal bug
  4. Follow up if this is something that would be fixed

This is nv internal bug 2488565.

The root cause is that the shader inlining limit changed between those drivers. Currently the dead code elimination is dependent on the inlining, and thus it’s not applied. For GL shaders, one can use “#pragma option inline all” to force inlining, but this is not possible to be triggered from WebGL.