Hi,
We are having Stuttering problem with 2 cards connected to 4 screens.
All data has been sent to escalate : Incident: 160821-000138
We have raised an escalation to Microsoft , they state the below:
"
The bad news: I can’t tell you how you can fix the problem in software, because it isn’t a problem that appears to have anything to do with your code and design, and it is NOT a Windows bug.
Here are some of the bits of things I was looking at via GPUView:
Type: Unknown / Multiple (32), Time: 92617409408 (9261,740.9408ms), TaskId: 0, Version: 2
StackWalk TimeStamp: 27141775722, Associated Event: 92617409348
Process: Review.exe
0 0xFFFFF80002EE59EB ntoskrnl.exe!KeSetEvent+0x81
1 0xFFFFF8800FD71057 dxgmms1.sys!VidSchiUnwaitWaitQueuePacket+0xBB
2 0xFFFFF8800FD71178 dxgmms1.sys!VidSchiCompleteSignalCommmand+0x114
3 0xFFFFF8800FD6D059 dxgmms1.sys!VidSchiProcessCompletedQueuePacketInternal+0x131
4 0xFFFFF8800FD6C7DA dxgmms1.sys!VidSchiProcessDpcCompletedPacket+0x3B6
5 0xFFFFF8800FD6BE00 dxgmms1.sys!VidSchDdiNotifyDpcWorker+0x198
6 0xFFFFF8800FD6BC4C dxgmms1.sys!VidSchDdiNotifyDpc+0x94
7 0xFFFFF8800FC721CF dxgkrnl.sys!DxgNotifyDpcCB+0x77
8 0xFFFFF8800F0E2C28 nvlddmkm.sys+0xC8C28
9 0xFFFFF8800F0E386B nvlddmkm.sys+0xC986B
10 0xFFFFF8800F0E36B2 nvlddmkm.sys+0xC96B2
11 0xFFFFF8800F141DEF nvlddmkm.sys+0x127DEF
12 0xFFFFF80002E96B1C ntoskrnl.exe!KiRetireDpcList+0x1BC
13 0xFFFFF80002E8E165 ntoskrnl.exe!KyRetireDpcList+0x5
14 0xFFFFF80002E8DF7C ntoskrnl.exe!KiDispatchInterruptContinue
15 0xFFFFF80002ED7453 ntoskrnl.exe!KiDpcInterruptBypass+0x13
16 0xFFFFF80002E87522 ntoskrnl.exe!KiInterruptDispatch+0x212
17 0x00007FFD179C9E80 ???!???
18 0x000000080C8A0064 ???!???
Thread: 2576
Type: Unknown / Multiple (32), Time: 92617410393 (9261,741.0393ms), TaskId: 0, Version: 2
StackWalk TimeStamp: 27141775877, Associated Event: 92617409895
27141798799
Process: Review.exe
0 0xFFFFF8800F230AB8 nvlddmkm.sys+0x216AB8
1 0xFFFFF8800F232913 nvlddmkm.sys+0x218913
2 0xFFFFF8800F42DDC1 nvlddmkm.sys+0x413DC1
3 0xFFFFF8800F1BE393 nvlddmkm.sys+0x1A4393
4 0xFFFFF8800F1B6301 nvlddmkm.sys+0x19C301
5 0xFFFFF8800F10938B nvlddmkm.sys+0xEF38B
6 0xFFFFF8800F109417 nvlddmkm.sys+0xEF417
7 0xFFFFF8800F0E3744 nvlddmkm.sys+0xC9744
8 0xFFFFF8800F141F72 nvlddmkm.sys+0x127F72
9 0xFFFFF80002E8747C ntoskrnl.exe!KiInterruptDispatch+0x16C
10 0x00007FFD179C9E80 ???!???
11 0x0000000804FE006F ???!???
Interesting thread stack (how do I turn this into a thread ID?)
Type: Unknown / Multiple (32), Time: 92621940580 (9262,194.0580ms), TaskId: 0, Version: 2
StackWalk TimeStamp: 27143058988, Associated Event: 92621940559
Process: Idle
0 0xFFFFF80002EE58AA ntoskrnl.exe!KiReadyThread+0x5
1 0xFFFFF80002E96F97 ntoskrnl.exe!KiProcessExpiredTimerList+0x157
2 0xFFFFF80002E96DEE ntoskrnl.exe!KiTimerExpiration+0x1BE
3 0xFFFFF80002E96BD7 ntoskrnl.exe!KiRetireDpcList+0x277
4 0xFFFFF80002E8336A ntoskrnl.exe!KiIdleLoop+0x5A
5 0x00007FFD179C9E80 ???!???
6 0x000000080C910064 ???!???
Thread 48:
Type: Unknown / Multiple (32), Time: 92618219854 (9261,821.9854ms), TaskId: 0, Version: 2
StackWalk TimeStamp: 27142005137, Associated Event: 92618219412
Process: System
0 0xFFFFF80002E459CC ntoskrnl.exe!KeDelayExecutionThread+0xE3
1 0xFFFFF8800F1BCFCE nvlddmkm.sys+0x1A2FCE
2 0xFFFFF8800F4B557E nvlddmkm.sys+0x49B57E
3 0xFFFFF8800F51ABE1 nvlddmkm.sys+0x500BE1
4 0xFFFFF8800F51A864 nvlddmkm.sys+0x500864
5 0xFFFFF8800F510769 nvlddmkm.sys+0x4F6769
6 0xFFFFF8800F5326D0 nvlddmkm.sys+0x5186D0
7 0xFFFFF8800F52EDFF nvlddmkm.sys+0x514DFF
8 0xFFFFF8800F52E71A nvlddmkm.sys+0x51471A
9 0xFFFFF8800F52F337 nvlddmkm.sys+0x515337
10 0xFFFFF8800F52E9F7 nvlddmkm.sys+0x5149F7
11 0xFFFFF8800F52F9EC nvlddmkm.sys+0x5159EC
12 0xFFFFF8800F52F823 nvlddmkm.sys+0x515823
13 0xFFFFF8800F51146E nvlddmkm.sys+0x4F746E
14 0xFFFFF8800F2AA564 nvlddmkm.sys+0x290564
15 0xFFFFF8800F39A175 nvlddmkm.sys+0x380175
16 0xFFFFF8800F39A3EA nvlddmkm.sys+0x3803EA
17 0xFFFFF8800F39AEB0 nvlddmkm.sys+0x380EB0
18 0xFFFFF8800F1B7B08 nvlddmkm.sys+0x19DB08
19 0xFFFFF80003181F4D ntoskrnl.exe!IopProcessWorkItem+0x3D
20 0xFFFFF80002E95A21 ntoskrnl.exe!ExpWorkerThread+0x111
21 0xFFFFF80003128CCE ntoskrnl.exe!PspSystemThreadStartup+0x5A
22 0xFFFFF80002E7CFE6 ntoskrnl.exe!KxStartSystemThread+0x16
23 0x00007FFD179C9E80 ???!???
24 0x000000080E51006F ???!???
//JBT note the mutex (mutant) which is being held.
Thread 48:
Type: Unknown / Multiple (32), Time: 92621961469 (9262,196.1469ms), TaskId: 0, Version: 2
StackWalk TimeStamp: 27143064899, Associated Event: 92621961430
Process: System
0 0xFFFFF80002EE58AA ntoskrnl.exe!KiReadyThread+0x5
1 0xFFFFF80002EC0710 ntoskrnl.exe!KiProcessThreadWaitList+0x60
2 0xFFFFF80002E9682A ntoskrnl.exe!KeReleaseMutant+0x2EA
3 0xFFFFF8800F21037E nvlddmkm.sys+0x1F637E
4 0xFFFFF8800F39AEBE nvlddmkm.sys+0x380EBE
5 0xFFFFF8800F1B7B08 nvlddmkm.sys+0x19DB08
6 0xFFFFF80003181F4D ntoskrnl.exe!IopProcessWorkItem+0x3D
7 0xFFFFF80002E95A21 ntoskrnl.exe!ExpWorkerThread+0x111
8 0xFFFFF80003128CCE ntoskrnl.exe!PspSystemThreadStartup+0x5A
9 0xFFFFF80002E7CFE6 ntoskrnl.exe!KxStartSystemThread+0x16
10 0x00007FFD179C9E80 ???!???
11 0x000000080CB70064 ???!???
Thread 48 is busy processing stuff via IopProcessWorkItem and the nVidia driver it calls for over 400 ms. That last stack trace shows it is releasing a mutex held by the device driver, in stack frame 2, which is called by stack frame 3, the device driver.
The last thing one of your rendering worker threads is doing is going into the nVidia device driver to handle an interrupt: that’s the second trace shown for thread #2576. Right before that, it’s calling SetEvent, so it’s the driver handling the interrupt on that thread.
Thus: it appears from the information I currently have that the nVidia driver is waiting on the same mutex, quite probably shared amongst all installed devices on the system (I would need to spend a bit more time to verify that: I also don’t have nVidia symbols available at this moment). Thus, this is why it would result in your system having all the displays running into this problem.
From using Windows Performance Analyzer, while it is stuck in that 400+ ms dead time where you are not getting updated video, one of your 24 cores is maxed out, but the others are waiting with nothing to do: it is processing all the queued up GPU commands and whatever it is doing during that time in the nVidia driver: it’s a real pity you’re not getting good utilization out of that Xeon as a result of the driver.
It is unclear to me which version of the nVidia driver this is: please run DXDiag in your System32 folder and upload that data. nVidia and Microsoft have a close working relationship, though we don’t currently have their symbols. I can get this information to nVidia, but also I’d urge you to contact them as well. This counts as a driver bug.
While that is going on, I would urge you to go and get graphics cards from an alternate vendor: your application should work fine with other GPUs, at least as far as this goes. Because I don’t know the extent as to what will happen if you get a garbage collection pause from your managed application portion because I don’t have enough information on how that is tied together, I cannot tell you whether or not, or how badly your system would be affected if there’s a long garbage collection pause.
"
R7610_DxDiag.txt (66.8 KB)