We’re experiencing several OpenGL bugs in our software when upgrading to the 565.57.01 driver. Normally we wouldn’t consider regressions in a beta driver as a problem, however many Linux distributions have decided to rush a 565.57.01 release due to security issues discovered in the most recent stable Linux driver.
Having spent a few hours root-causing it, it appears that the OpenGL driver is not always honoring explicit memory barriers on the client-side when modifying buffer objects which causes all sorts of problems for our software ranging from strange visual glitches to nothing rendering at all.
Some things of interest
- Vulkan appears to be fine, in fact using Zink (OpenGL on Vulkan) no issues are present.
- Running the software under NSIGHT, no issues are present.
- Running the software through RenderDoc, no issues are present.
- Of course, downgrading the driver to 560.35.03, no issues are present.
[EDIT]
It appears that running under Wayland addresses some of the visual artifacts but there are still other issues present. This is beginning to look more like synchronization issues in the driver.
[EDIT EDIT]
I was able to track it down to a call to glClearNamedBufferSubData
, for some reason the clear isn’t completing successfully. Introducing a glFinish
after the clear seems to fix the problems on this driver. As far as I can tell all the barriers are correct here (in fact issuing a glMemoryBarrier(GL_ALL_BARRIER_BITS)
changes nothing) so it looks like glClearNamedBufferSubData
isn’t synchronizing correctly for some reason.
We don’t have a trivial reproducer and the software in question is proprietary so if more information is needed I’d suggest an email chain.
nvidia-bug-report.log.gz (1.8 MB)
2 Likes
First of all, nice work digging into this!
I’ve also been noticing this with OpenGL based applications. More specifically, I see it frequently when running Android applications in Android Studio’s emulator as well as Genymotion’s Player (another Android emulator). Both of them use QEMU/KVM and GPU pass-through with OpenGL for rendering.
I know you noted that behavior started with driver 565, but I’ll add that I started seeing this issue starting with 560. I did not see the issue in 555.
To describe what I’m seeing when this issue occurs: it’s a lot like you’re either seeing frames out of order (i.e. you’ll see flashes of frames that have already been rendered previously which is especially obvious when text is involved) or you’re seeing things trying to render using data it shouldn’t be (i.e. UI component replaced with distorted text characters) and at worst you get 3D graphics that flash and stretch across the screen.
Here’s a screenshot showing the issue happening and then a screenshot showing once it fixes itself on a future frame. Notice in the first screen that the bottom row is populated by weird distorted UI elements and many of the characters are in the wrong order when compared to the good screen.
Bad:
Good in second post because I am a new user I can only post one image.
The game in question here is Arknights running in Genymotion’s Player on Fedora 41 + Nvidia drivers 565.57.01 with a GTX 1080.
If repro steps would help, here’s how to do it with Android Studio’s emulator (easier than Genymotion to set up):
- Set up Android Studio
- Create an Android Virtual Device (AVD) for Android >= 11 with Playstore support
- Install Arknights from the Playstore
- Watch the main menu flicker/distort or swipe around the various menus to trigger it.
@weilercdale - Please chime in if you feel like this is very different from what you’re experiencing as I don’t want to accidentally commandeer your thread if you think this is a different issue.
A minimal reproducer (Apitrace?) would be appreciated. We’re tracking this as NVIDIA bug 4988237.
1 Like