SDSM tighting compute shader at driver 347.09 WHQL super slow and at old driver 344.75 super fast

With version 347.09 WHQL tooks my SDSM tighting compute shader whole 70 ms per frame instead with the old verson 344.75 under 1ms (circa 0.7ms) on my GTX970.

See sdsm_reduce_tighting.glsl at SDSM compute shader NVIDIA driver 347.09 WHQL slowdown bug · GitHub , the $-stuff is my own preprocessor.

I’ver tracked to the “((lLinearZ >= reduceDataPartitions[lPartitionIndex].x) && (lLinearZ <= reduceDataPartitions[lPartitionIndex].w))” (in sdsm_reduce_tighting.glsl) comparsion down, but really just the comparsion itself, because:

70ms per frame:
if((lLinearZ >= reduceDataPartitions[lPartitionIndex].x) && (lLinearZ <= reduceDataPartitions[lPartitionIndex].w)){ 
  minBoundsSun[lPartitionIndex] = min(minBoundsSun[lPartitionIndex], lSunLightSpaceCoord.xyz);
  maxBoundsSun[lPartitionIndex] = max(maxBoundsSun[lPartitionIndex], lSunLightSpaceCoord.xyz); 
}
70 ms per frame:
bool b = (lLinearZ >= reduceDataPartitions[lPartitionIndex].x) && (lLinearZ <= reduceDataPartitions[lPartitionIndex].w); 
minBoundsSun[lPartitionIndex] = mix(minBoundsSun[lPartitionIndex], min(minBoundsSun[lPartitionIndex], lSunLightSpaceCoord.xyz), b); 
maxBoundsSun[lPartitionIndex] = mix(maxBoundsSun[lPartitionIndex], max(maxBoundsSun[lPartitionIndex], lSunLightSpaceCoord.xyz), b);
35 ms per frame:
bool b = (lLinearZ >= reduceDataPartitions[lPartitionIndex].x); 
minBoundsSun[lPartitionIndex] = mix(minBoundsSun[lPartitionIndex], min(minBoundsSun[lPartitionIndex], lSunLightSpaceCoord.xyz), b); 
maxBoundsSun[lPartitionIndex] = mix(maxBoundsSun[lPartitionIndex], max(maxBoundsSun[lPartitionIndex], lSunLightSpaceCoord.xyz), b);
without any comparsion checks, it is  under 1ms per frame again but the result is unusable then:
minBoundsSun[lPartitionIndex] = min(minBoundsSun[lPartitionIndex], lSunLightSpaceCoord.xyz); maxBoundsSun[lPartitionIndex] = max(maxBoundsSun[lPartitionIndex], lSunLightSpaceCoord.xyz);

Is it a new GLSL compiler bug in 347.09, or why it is with the new driver version so slow now?

This issue seems to be still existent at the new 347.25 driver version. Why is this bug report ignored without fallback?

I think there is a specific format that gives more information so the issue can be at least replicated.

Hardware specs, OS, Driver version ( which you have already given )

Hardware specs are not so important here, since the issue is hardware-independent, but anyway:

My main desktop, where I’m working&testing primarly: Intel Xeon E3-1245v2 QuadCore Workstation/Server CPU, Asus P8C WS Workstation/Server Mainboard, 16GB ECC non-buffered RAM, EVGA Geforce GTX970 Superclocked

My main notebook: Lenopvo Thinkpad x230 with a i7 Dualcore CPU, 16GB non-ECC RAM (and just 8GB RAM when eGPU connected, due to a BIOS/UEFI bug) and with a eGPU Expresscard Adapter selectively with a Zotac GTX650 PCIe card or a Gigabyte Geforce GTX660OC PCIe card connected, dependly with which GPU performance I’ll debugging/testing my stuff.

My second notebook: Dell XPS L702X with a Sandybridge i7 2630M CPU, 16GB non-ECC RAM, Geforce GT555M with detected 3GB VRAM

The issue appears on these all computers, so the issue may be not hardware-dependent, it’s rather independent by the hardware but driver version dependent, since the performance is okay until including 344.75, but with 347.09 and up the SDSM tighting compute shader is suddenly damn slow with a 100x slow down factor in contrast to 344.75 and older driver versions.

And on my AMD test desktop computer with a Raedon HD7850 with the newest driver from AMD, the performance is also normal like at NVIDIA with the driver versions <= 344.75. And on my OpenGL 4.3 capable Intel iGPU test desktop&notebook computers also of course. I’ve a whole serie of computers, tablets and smartphones here primarly just for GPU testing of my engine coding work for my indie games for different platforms.

And this issue with NVIDIA now robs still almost my sleep, since I’ve no workaround for a working SDSM tighting compute shader for NVIDIA driver versions >= 347.09 so far.

Our OpenGL driver team started looking at the issue but is unable to generate a working reproducer with the given information.

Source code excerpts which require a custom pre-processor won’t help here. E.g. trying to manually pre-process your GLSL shader code into OpenGL conformant syntax results in a mostly empty shader because neither SUN or MOON are defined.

Instead of source code excerpts, what is really required for analysis would be a minimal reproducer executable in failing state.

Below is my OpenGL and GLSL bugreport checklist which I gathered while working in the OpenGL driver team to reduce turnaround time.

File sharing sites are often blocked from inside NVIDIA offices. If your reproducer executable package is too big to be attached here, I can set up a temporary FTP site to exchange data. Send me a personal message with your contact details if you’d need that.

========================================
We normally need the following information to start analyses of bug reports.
(This is the general list and might not apply to all reports.)

  1. Operating system version.
    On Linux, an nvidia-bug-report.log generated by running nvidia-bug-report.sh as root.
  2. Graphics hardware.
  3. Graphics driver version.
  4. Display Control Panel settings for screen resolution, monitor configs, and driver settings.
    Under Windows: NVIDIA Control Panel → Help → System Information → Save.
  5. Reproducer project.
    At least an executable which shows the problem. The simpler, the better.
    Make sure all necessary files to run this standalone are included (manifests, runtimes). Assume a clean test system!
    Source code in failing state highly appreciated.
    For GLSL compiler failures (C9999), the minimal set of shader sources reproducing the problem.
  6. Description of single steps to reproduce the problem.
  7. Description of the expected result (screenshots if possible).
    Performance issues require absolute measurement data and a description of how to reproduce them.
  8. If there is a crash in an NVIDIA module, the exact crash offset.

IMPORTANT: Our e-mail servers block ZIP attachments and executables and Outlook itself blocks some more. Please rename such extensions, replacing the last character of the extension with an underbar will do.

Okay, i’ll implement a minimal OpenGL 4.3 Sample Distribution Shadow Maps test case with the preprocessed GLSL SDSM partition tighting compute shader code at the weekend, where the issue is occuring.

I’ll upload it then on my own dedicated rootserver, including source code and executable binary with SDL 2.0 DLL and otherwise no other dependency.

And SUN and MOON will defined at runtime, so at the most time is only a shader variant active, where only SUN is defined. But the minimal testcase will have anyways preprocessed shaders then.

And tommorrow I will upload my game already for the first in-live-look at the issue, for until I’ve finished the minimal testcase.

The minimal testcase is downloadable now at http://rootserver.rosseaux.net/stuff/SDSMTightingComputeShaderIssueTestcase.zip including precompiled 32-bit windows executable binary and source code, which is compilable with Delphi >=7 and FreePascal >= 2.4.2

with driver version 344.75 on my EVGA Geforce GTX970 Superclocked in my Xeon E3-1245v2 Workstation with non-buffered 16 GB ECC-RAM:

and with driver version 347.25 (347.09 gives similiar bad results) on my EVGA Geforce GTX970 Superclocked in my Xeon E3-1245v2 Workstation with non-buffered 16 GB ECC-RAM:

  1. Operating system version.

Windows 8.1 64-bit

  1. Graphics hardware.

EVGA Geforce GTX970 Superclocked, Gigabyte GTX660OC, Zotac GTX650Ti and Geforce GT555M in my Dell notebook

  1. Graphics driver version

The issue affects driver version 347.09 and up

  1. Display Control Panel settings for screen resolution, monitor configs, and driver settings.

From my main desktop computer with the Geforce GTX970 with again to downgraded 344.75 driver version:

NVIDIA Systeminformationen-Bericht erstellt am: 01/30/2015 21:30:52
Name des Systems: BEROXEON

[Anzeige]
Betriebssystem:	Windows 8.1 Pro, 64-bit
DirectX-Version:	11.0 
GPU-Prozessor:		GeForce GTX 970
Treiberversion:		344.75
Direct3D-API-Version:	11.2
Direct3D-Funktionsebene:	11_1
CUDA-Kerne:		1664 
Kerntakt:		1164 MHz 
Speicher-Datenrate:	7010 MHz
Speicherschnittstelle:	256-Bit 
Speicherbandbreite:	224.32 GB/s
Gesamter verfügbarer Grafikspeicher:	12226 MB
Dedizierter Videospeicher:	4096 MB GDDR5
System-Videospeicher:	0 MB
Freigegebener Systemspeicher:	8130 MB
Video-BIOS-Version:	84.04.1F.00.72
IRQ:			16
Bus:			PCI Express x16 Gen3
Geräte-ID:		10DE 13C2 29743842
Teilenummer:		G401 0010

[Komponenten]

NvGFTrayPluginr.dll		17.12.8.0		NVIDIA GeForce Experience
NvGFTrayPlugin.dll		17.12.8.0		NVIDIA GeForce Experience
nvui.dll		8.17.13.4475		NVIDIA User Experience Driver Component
nvxdsync.exe		8.17.13.4475		NVIDIA User Experience Driver Component
nvxdplcy.dll		8.17.13.4475		NVIDIA User Experience Driver Component
nvxdbat.dll		8.17.13.4475		NVIDIA User Experience Driver Component
nvxdapix.dll		8.17.13.4475		NVIDIA User Experience Driver Component
NVCPL.DLL		8.17.13.4475		NVIDIA User Experience Driver Component
nvCplUIR.dll		8.0.800.0		NVIDIA Control Panel
nvCplUI.exe		8.0.800.0		NVIDIA Control Panel
nvWSSR.dll		6.14.13.4475		NVIDIA Workstation Server
nvWSS.dll		6.14.13.4475		NVIDIA Workstation Server
nvViTvSR.dll		6.14.13.4475		NVIDIA Video Server
nvViTvS.dll		6.14.13.4475		NVIDIA Video Server
NVSTVIEW.EXE		7.17.13.4475		NVIDIA 3D Vision Photo Viewer
NVSTTEST.EXE		7.17.13.4475		NVIDIA 3D Vision Test Application
NVSTRES.DLL		7.17.13.4475		NVIDIA 3D Vision Module
nvDispSR.dll		6.14.13.4475		NVIDIA Display Server
NVMCTRAY.DLL		8.17.13.4475		NVIDIA Media Center Library
nvDispS.dll		6.14.13.4475		NVIDIA Display Server
PhysX		09.14.0702		NVIDIA PhysX
NVCUDA.DLL		8.17.13.4475		NVIDIA CUDA 6.5.30 driver
nvGameSR.dll		6.14.13.4475		NVIDIA 3D Settings Server
nvGameS.dll		6.14.13.4475		NVIDIA 3D Settings Server

Under Windows: NVIDIA Control Panel → Help → System Information → Save.

  1. Reproducer project.

http://rootserver.rosseaux.net/stuff/SDSMTightingComputeShaderIssueTestcase.zip

SDSMTightingComputeShaderIssueTestcase.exe is the precompiled binary and SDSMTightingComputeShaderIssueTestcase.dpr is the main object pascal source code file for Delphi and FreePascal, the *.bin are from-my-indie-game-dumped example data and the *.glsl are the compute shaders.

  1. Description of single steps to reproduce the problem.

First run the testcase from SDSMTightingComputeShaderIssueTestcase.zip with driver version 344.75 or older, then again with driver version 347.09 or newer.

  1. Description of the expected result (screenshots if possible).

with driver version 344.75 on my EVGA Geforce GTX970 Superclocked in my Xeon E3-1245v2 Workstation with non-buffered 16 GB ECC-RAM:

and with driver version 347.25 (347.09 gives similiar bad results) on my EVGA Geforce GTX970 Superclocked in my Xeon E3-1245v2 Workstation with non-buffered 16 GB ECC-RAM:

  1. If there is a crash in an NVIDIA module, the exact crash offset.

No crash

Any progress updates? Is/was my testcase helpful, or should I improve it?

Yes, thanks for your test case application!
This has been reproduced and the upcoming release 349 beta drivers will be fast again.
It’ll take a few more weeks until that gets available on the beta driver download page.