We are facing a quite random and difficult to debug issue with our commercial software.
We currently support a set of specific server configurations, with Maxwell, Pascal, and Turing based Quadro 4000 GPU’s (so M4000, P4000 and RTX4000)
We have a software version, let’s say version A, that was certified to work with driver 419.67. It works fine in all the server configurations.
We have another software version, version B, that is beign certified to work with driver 442.92, and in fact requires this driver version or higuer, because we are using new feature in the NVENC API for better performance.
The issue is that version B randomly crashes, only in servers with two Quadro M4000.
Furthermore, we re-verified that version A does not crash on those servers, but it does randomly crash if we install driver version 442.92
We found the issue with 452.06 too
Is there any known bug in new drivers with Macwell GPU’s?
The rest of the system configuration
Windows 10 Enterprise 2015 LTSB
Intel Xeon CPU E5-1620 v3 @ 3,50GHz
16GB of DDR4 2133Mh
Two Quadro M4000 GPU’s connected to a full x16 PCIe 3.0 each
Server board and enclosure -> Dell Precission Tower 5810
The configurations that work include Dell Precisison Tower 5820 with dual P4000 GPU’s and Supermicro servers with Dual and triple Quadro RTX 4000 in Intel and AMD Epyc motherboards, with Windows 10 Enterprise 2016 LTSB