I’m testing a driver upgrade on RHEL7.6 from 418.40.04, to 440.95.01, on both Tesla T4’s and P100s (I still need to test some K80s), and some preliminary tests are showing a significant decrease in graphics performance, and I’m wondering if anyone else is seeing this.
I know it’s not a real benchmark, but I’m using glxgears as a quick sanity check. On my nodes with a single T4, I’m getting more than 40k frames/second with the 418.40.04 driver, but when I upgrade to 440.95.01, I get output that’s exactly 1 FPS instead, at least some of the time. If I restart the Xorg server that I’m targeting, I can sometimes get it to resume the faster speed, but it eventually drops back down. I’m testing now to see how long it takes to drop down.
I’m seeing similar behavior on the hosts with 4 P100s.
There are a few package changes between the two host images, including a very minor kernel version (RH’s 3.10.0-957.48.1.el7.x86_64 vs 3.10.0-957.54.1.el7.x86_64), but I don’t expect that’s significant.