Slow GPU performance with 440.95.01

I’m testing a driver upgrade on RHEL7.6 from 418.40.04, to 440.95.01, on both Tesla T4’s and P100s (I still need to test some K80s), and some preliminary tests are showing a significant decrease in graphics performance, and I’m wondering if anyone else is seeing this.

I know it’s not a real benchmark, but I’m using glxgears as a quick sanity check. On my nodes with a single T4, I’m getting more than 40k frames/second with the 418.40.04 driver, but when I upgrade to 440.95.01, I get output that’s exactly 1 FPS instead, at least some of the time. If I restart the Xorg server that I’m targeting, I can sometimes get it to resume the faster speed, but it eventually drops back down. I’m testing now to see how long it takes to drop down.

I’m seeing similar behavior on the hosts with 4 P100s.

There are a few package changes between the two host images, including a very minor kernel version (RH’s 3.10.0-957.48.1.el7.x86_64 vs 3.10.0-957.54.1.el7.x86_64), but I don’t expect that’s significant.

Preliminarily, this appears to be the result of the “HardDPMS” being on by default as of the 440 driver, while running headless, and can be alleviated by an option in the xorg.conf, as shown in these two references:

So far, I’ve been able to run for 2 hours with either the 440 or 450 driver, with no slowdowns, whereas I saw slowdowns within 15’ish minutes previously.