Adaptive Sync causes the screen to go blank when the refresh rate drops below a certain value

-The Problem-
While in-game with Adaptive Sync active (more specifically during fps drops, long stuttering periods or simple loading screens) the screen goes blank for a second or so. Then immediately recovers and Adaptive Sync keeps working fine.

I can reproduce this consistently with some games just by pausing and unpausing (or opening and closing menu windows) very quickly causing stutters.

-More info-
This used to happen on Windows with a lot of Freesync monitors a while ago but not anymore.

This might be because the Windows drivers now do frame multiplying even if the fps of the game haven’t dropped below the monitor’s minimum VRR range value yet.

Using my monitor’s OSD, I can see that the Windows drivers do this much more aggressively than the Linux blob. The Linux drivers, in fact, multiplies the frames sent to my monitor only if the fps drop below my monitor’s minimum VRR range value (doing what AMD calls “low frame rate compensation” or “LFC”).

-Workarounds-
I could mitigate this problem greatly by using a custom EDID and changing my minimum VRR range value to something higher.
This causes LFC to kick in at higher refresh rate values so that frames get multiplied and sent to my monitor much earlier preventing my refresh rate from dropping down to those lowish values that make my screen go blank.

I don’t know if the Windows drivers have some specific optimizations for certain monitors that the Linux blob lacks (or simply aren’t as buggy regarding Adaptive Sync) but what I know for sure is that (at least regarding my own monitor) this problem is currently only present on Linux while Windows, testing the same games/test applications, works fine.

-How to reproduce-
Cross platform software recommended:

https://github.com/Nixola/VRRTest/releases/latest

  • Download the AppImage for Linux (or the zip file for Windows) and run the application.
  • Optional: Use the right arrow key to slow down the grey bar (keep it pressed for a while especially on very high refresh rate monitors).
  • Optional: press the b key once to enable busy waiting for more stable frame times.
  • Use the down arrow key to drop the fps sufficiently below your minimum VRR range value.
  • Leave it running for a while (this usually takes from a few seconds up to a couple of minutes) and observe the screen go blank and then recover after a second or so.
  • Repeat the test on Windows and observe no blaking at all.
  • -Speculations-
    The fact that LFC is not working properly (see https://devtalk.nvidia.com/default/topic/1059120/linux/low-frame-rate-compensation-with-adaptive-sync-fails-to-prevent-tearing-on-linux-works-fine-on-windows-10-/ ) could also contribute to this screen blanking bug.

    -Hardware-
    Monitor: Alienware AW2518HF (VRR range: 48-240 Hz, non validated).
    GPU: GTX 1070ti

    -Software-
    Distros tested: Ubuntu 19.04 (fresh install), Kubuntu 18.04
    Tested driver versions: 430.40, 430.26 and 418.56
    Window managers tested: Mutter, Kwin (no composition), Openbox
    nvidia-bug-report.log.gz (1.15 MB)

    1 Like

    Custom EDID doesn’t completely eliminate it, a stutter or loading screen will cause a blank regardless.

    Also on Windows where this is not a problem I’ve noticed LFC behaves differently, like there’s some sort of smoothing going on in where it doesn’t automatically turn off LFC as soon as the FPS gets back in the VRR range. On Linux LFC just goes on or off at either side of the minimum vertRefresh which probably causing havoc with the scaler.

    1 Like

    Yep. That workaround is far from being a solution as it only helps with frame drops to a certain degree.

    I also noticed the exact same thing with LFC. I described it in detail in the link in “Speculations” above. What I know for sure is that this behavior is at least causing tearing (even with LFC active) but, as you said, it could also contribute to (or even directly cause) this bug.

    I know that the guys at AMD recently pushed a few patches in order to improve LFC behaviour in this exact situation with their cards.
    More info here: https://www.phoronix.com/scan.php?page=news_item&px=AMDGPU-DC-FreeSync-LFR-Better

    Regarding Nvidia… Well, we still don’t really know if they’re even aware of this…

    I also have this issue and started a thread awhile ago, with no response from Nvidia:

    https://devtalk.nvidia.com/default/topic/1047827/freesync-gsync-issue/

    I have only tried setting the min refresh rate down, not higher, might try that. There is also a thread semi-related here:

    https://devtalk.nvidia.com/default/topic/1058018/linux/g-sync-causing-display-signal-loss/

    Given that it seems to be fixed in Windows, it is pretty disheartening that it has gone so long on Linux without even a mention or confirmation from Nvidia. I would find it hard to believe that they aren’t aware of it.

    I asked around about this and there shouldn’t be any behavior difference between Windows and Linux, at least on Pascal GPUs, since the adaptive sync code is the same for both operating systems. There also isn’t a table of known-broken adaptive sync monitors, so Windows and Linux should be automatically choosing the same minimum refresh rate and engaging and disengaging LFC at the same time.

    Would you mind trying to adjust the min refresh rate upward? That wouldn’t really explain a difference from Windows, but it would help figure out what the true minimum for this panel is.

    @aplattner

    Here below is a video example of Gsync display loss, I can reproduce it 100% with No Mans Sky after it finishes loading (blanks at 53 seconds in with this case) however every game including native apps (tested CSGO/DXMD/Hitman/KSP) is affected. Upper right displays screen refresh (using 57-144 EDID) doesn’t matter if I up it further or lower it same result. If I don’t use an EDID the screen refresh will hit as low as 40 and LFC from there, but using an EDID it LFC’s from 57.

    https://youtu.be/fce5F0zg5SI

    Here below shorter video of same loading on windows:

    https://youtu.be/56SnMNVX3no

    And here is the Pendulum demo in windows, notice the behavior of the LFC. I am not using an EDID on windows yet as soon as it drops below 53hz it doubles, and stay doubled until it increases beyond 60hz:

    https://youtu.be/PwmlyhPQD3I

    Interesting, thanks for the video.

    I’m not sure how this particular monitor behaves, but from the ones I’ve seen, having the frame rate go below the minimum just temporarily blanks the screen. Your video looks like a full display reset, including turning the backlight off and on (twice) which makes me think this might not be directly related to VRR, or at least be some other problem that’s exacerbated by VRR.

    Do you have second system you could use to SSH into to this system while running the test? I’m curious whether “tail -F /var/log/Xorg.0.log” shows any display-related messages when the glitch occurs.

    The VRR machinery is somewhat sensitive to interrupt latency, so it might also be a worthwhile experiment to install an older kernel version to see if this regressed at some point.

    Hi Aaron and thank you for taking the time to look into this!

    I raised my minimum variable refresh rate even higher (80Hz!) and I could still reproduce this easily by causing stutters by quickly pausing and unpausing my game (Ys VIII) running it at 60fps for a few seconds. LFC was active.

    After I unlocked the framerate and ran the game at 200fps (LFC not active) I could no longer replicate this.

    Testing this further, I reset my VRR range to default (48-240Hz) and ran a test application at 50 fps (so that it would not trigger LFC).

    On Linux, it took ~7 minutes before it would blank out.
    On Windows, I let it run for quite a long time before I eventually gave up trying to reproduce this.

    This seems to indicate that:
    1- It still only happens on Linux.
    2- It happens with and without LFC so LFC misbehaving doesn’t seem to be the direct (or at least the only) cause of this screen blanking bug.

    The lowest value that doesn’t blank my screen (with no LFC active) seems to be 56Hz.

    I think that if LFC started to behave correctly, it would be able to mitigate this problem even further if not to completely solve it. And even if it doesn’t solve it, it could still help us to identify the cause much better.

    Like fl2015 I also seem to experience full display resets, even the audio cuts out if I use the display port output. I don’t recall catching any errors in /var/log/Xorg.0.log but I guess I’m gonna try again now.

    If you need further testing, I’m currently locked indoors because of the pandemic so I have all the time in the world to help. :-P

    Thanks green_squid. I’m interested to see if there are any messages that coincide with the blanking, not just errors. In particular any “Setting Mode…” or just display detection messages.

    I have the opposite issue… I’m working from home because of the pandemic and all of the adaptive sync monitors are in the office. So my ability to test LFC issues is limited because my home test monitors are all true G-SYNC.

    I don’t see any issues then: we arrange a hardware swap, you get better testing equipment, I get a better monitor and this first world problem will be no more!

    Unfortunately I couldn’t get anything out of /var/log/Xorg.0.log even when running startx with -logverbose 6 or even 20 when the screen blanks…

    Also, since you mentioned the kernel version, I’m currently using the ancient 4.15.0-88 from Kubuntu 18.04.

    OK, I might be onto something here:

    Since @aplattner mentioned interrupt latency and older kernels I decided to try to mess around with a -lowlatency kernel to see if I could find something interesting.

    While I was trying to assign max realtime priority to my test application just to see if it changed something (it obviously didn’t, btw) I also noticed that the “nvidia” kernel thread only ran with normal CPU priority (20, nice 0).

    So I tried giving max realtime priority to that kernel thread (totally overkill) like this:

    #!/bin/bash
    nvidia_kernel_pid="$(ps -A | grep ' nvidia$' | awk '{print $1}')"
    rtprio=99
    sudo chrt -r -p $rtprio $nvidia_kernel_pid
    

    And… lo and behold, I was able to run VRRTest in the 30-50fps range for an hour and 15 minutes without a single screen blank until I gave up trying to reproduce this bug. This never happened before, not even with a modified EDID (and these tests were done without any custom EDID in place).

    I then tried to restore the default priority and scheduling policy on the nvidia kernel thread, ran the test again and, as expected, I got a screen blank not even 3 minutes in.

    The bad news are that LFC is still broken (still causing tearing jumping all over the vrr range) and running my test game at 60fps while intentionally causing stutters still caused panel resets in just a matter or seconds.

    This really seems to be a latency problem though and I hope this points us in the right direction…

    I tried different Manjaro kernels including some rt kernels and noticed some improvement, the best though appeared to be 4.14 which completely eliminated blanking/resets in NMS, The Witcher 3 still reset like crazy though and having a browser in the background seems to exacerbate the blanking also, maybe is a sign latency is an issue.

    I can’t use 4.14 as a daily driver myself though as I’m on Ryzen 2nd gen which 4.14 lacks some bits for (temp sensors are broke for example).

    Just to update, don’t know if anything significantly changed with the Vulkan Beta driver but I’ve tried 440.66.02 and while 5.x kernels show no improvement I’ve found 4.19 (Manjaro LTS) with the latest vulkan dev beta is 98% fixed with very rare panel resets. Was able to play TW3, CSGO, NMS & Elite Dangerous for hours a while only having a single reset on the loading screen of ED.

    Question is though why do the 5.x kernels exacerbate the problem and 4.x kernels don’t?

    A little update: I took the time to make some more tests with different games and kernel combinations this time. Turns out -lowlatency kernels (both old and recent) only make things worse in a real world scenario: while my test application was unaffected (and even benefited from some renicing), some games that never gave me problems before became absolutely unplayable.

    I can also confirm that newer kernels do indeed exacerbate this problem but not as much as much as using a -lowlatency kernel.

    Maybe another clue pointing at a latency problem somewhere…

    @aplattner, I’m also experiencing this issue, but only from kernel 5.5.

    @aplattner I’m experiencing this issue on ubuntu 20.04 5.4.0-21-generic too. Drivers: 440.82

    @aplattner I’m also having this issue. Ubuntu 20.04 Kernel 5.6.3. Driver 440.64

    Also happens with 5.4/440.64 when the system runs for a few days.

    Update: I finally found a way to replicate this on Windows by “sabotaging” LFC via EDID editing.
    Since my normal VRR range is 48-240, I used CRU to set it to 30-240 so that LFC would not kick in at 48. My monitor now always blanks out at 47Hz and lower and this is true for both Windows and Linux.

    The reason why I couldn’t replicate this on Windows before is that, over there, LFC was working correctly all along.

    This seems to suggest that:

    1. The root cause of the problem is that my monitor can’t handle 47Hz and lower. Every Freesync monitor has its own lower limit and there’s no way to change this.
    2. This problem can still be completely prevented by a correctly functioning LFC.
    3. What this bug is all about is: “LFC isn’t working properly on Linux”.

    This could explain why nothing shows up in xorg logs when the monitor resets: it’s not the software that’s causing this, it’s just failing to prevent it.

    The reason why these resets happen even at higher framerates might be that stutters cause frametime spikes and that, in turn, drop the refresh rate under 47Hz blanking the screen. Even if it’s only for a few hundred milliseconds or so.

    Since I started to use Mangohud to show a frametime graph over my test application, I could clearly see that all the screen resets happening at 50Hz (LFC not active) were actually caused by stutters that I couldn’t see before.

    The fact that realtime kernels and renicing could help (or even make this worse) could be explained by the fact that they influence frametimes causing less (or more) stuttering depending on the game/test application.

    I don’t think I have anything else to try now. I just hope that you guys are doing well while stuck at home and can eventually fix this. @aplattner if you need further tests, I’m still alive.

    On linux, the issue happens for me without the fps going lower than 144. Any time the driver “resets”(switching TTYs, relogin and sometimes even after suspend) gsync-compatible will be unusable until I restart the PC. @aplattner can you guys at nvidia look into this, please?