Unusable Linux text console with nvidia-drm modeset=1 or if nvidia-persistenced is loaded

This is an RTX 3080.

I use regular Legacy BIOS boot (not UEFI), GRUB set to gfxpayload=keep, Plymouth set to text theme “details”. So I get a classic 80x25 text console on kernel boot and during the whole init process, until X starts.

Now the bad part. Mid-boot, the console hangs when the nvidia driver gets loaded (presumably by nvidia-persistenced). The screen freezes, the blinking cursor is gone. You can’t do anything about it until other consoles (tty2 and so on) start.

Whether I let it get to GUI login, or start the system in multi-user.target (aka runlevel 3), I can then switch consoles and get a login prompt. However:

  • whenever cursor reaches bottom of the screen (cat a file, ls a directory, ps -ef, dmesg, you name it), the console freezes and the only way to unfreeze it is to switch to another one and back (which makes you lose scrollback with shift-pgup/pgdn as usual)
  • after going back to it, sometimes things look normal (until you reach bottom of the screen again, like by pressing enter :)), but often times the “viewport” is bottom half of the screen (let’s say bottom 12 lines, I haven’t counted) but it’s displayed in the top half, while the bottom is some old output
  • clear/reset don’t do much, neither does setfont (font changes but breakage remains)
  • I was also able to get small bit of colorful garbage after switching to X and back, suggesting something else corrupted that bit of screen memory
  • when the console is “frozen”, there’s no way to make it scroll or display anything, but you can type commands (or login blindly if you’re trying to get into rescue.target but that also loads the driver ;)), and when you switch to another console and back, there’s at least some output. So it’s like things still going into the buffer somewhere, it’s just not the part of memory that gets displayed. Again: Shift-PgDn doesn’t do anything, scroll lock/^Q have no effect, Alt-SysRq-R etc. don’t show anything, dmesg in console (which would normally fill the whole buffer) doesn’t change anything. So it’s not like being scrolled up. It’s really detaching what you see from the actual buffer.

By trial and error I found out that:

  • if I start the system without X, switch console, log in as root and systemctl stop nvidia-persistenced, screen blinks and the problem disappears, as long as nvidia-drm.ko was loaded with modeset=0
  • if nvidia-drm.ko was loaded with modeset=1, it needs rmmod to fix the console even if neither X or nvidia-persistenced is running
  • as you’d expect, even if I disable nvidia-persistenced, as soon as X starts, text consoles break

The text mode console enters a broken state after one of:

  • loading nvidia-persistenced service
  • loading nvidia-drm.ko with modeset=1
  • starting Xorg

I’ve tried several kernel/driver combinations, everything I have available in Fedora 33 at this moment (initial release, current update and the update from updates-testing)

  • kernel 5.8.15, 5.12.14-5.12.17, 5.13.3
  • driver 455.28, 465.31, 470.57.02
    Most importantly, 4 combinations of newest/oldest have been tested to eliminate the potential of this being a new bug.

The issue seems to be specific to this card, or the Ampere architecture.

The difference between this 3080 and my old 1080 Ti was that monitor doesn’t blank out between the BIOS and GRUB screens, which I’m super happy about, but clearly this card has a different “text” mode for a 1080p display, so maybe this is connected to the problem somehow?

I wish I had access to more hardware and haven’t immediately sold the 1080 Ti…

2 Likes

I’ve tested booting the system with only a TV attached over HDMI (as oposed to a monitor over DP), just in case. Unfortunately, nothing changed - the Linux console is unusable.

Finally found your post, been looking for quite a while in vain for this problem, which I have too. A couple of remarks. I thought the problem arose when I installed a 4K monitor, not when I installed the RTX 2070. I may be mistaken. Second, my consoles are not really “frozen”. If enough lines are written to them, they don’t scroll, but jump and display a number of the new lines. It is as if there is a buffer that is much larger than what is displayed. Indeed, if I want to see the new lines immediately, I just switch to another console and back. The size of that buffer is around 156 lines. I’m using “NVIDIA UNIX x86_64 Kernel Module 460.67”.

Well, my TV is 4K, but it doesn’t change a thing (DP vs HDMI, 1080p vs 2160p). I think the computer still pushed 1080p to the TV in text mode - some kind of interaction between the GPU and the BIOS decides on the “boot” resolution.

I also have a 16:10 (1920x1200) monitor, I bet that won’t change a thing. I can test for science, but right now, cba ;)

You’re absolutely right - there is a limit of lines that eventually make it redraw, but it doesn’t scroll (so it’s not a “scroll buffer”) but, just as you said, it jumps.

Digging deeper: every 180 lines printed, I get a screen refresh. So there can be 204 lines: the 25 lines I see, then another 179 lines that don’t produce any output. The next one, the 205th total (or 180th if we only count the invisible buffer), doesn’t scroll what I see. Instead, now I see lines 156-180 of the previously invisible buffer (or 181-205 if we include the previously displayed 25) and then it stops scrolling for 179 more lines. So it’s up to 204 “in the buffer” or up to 179 “out of screen”, but I totally understand why you came up with the number 156 (more accurately it would be a “buffer” of 155, as the 156th eventually shows up ;)).

Aaaaanyways, 204 lines of actual “buffer”, 80 characters wide, is 16320 characters. 205 lines would need 16400 characters. It’s not crazy to suspect there’s a 16384 character limit somewhere.

By the way, have you tried older drivers? I can’t go lower than 455.28 because of hardware :/ (more accurately, the lowest I can go would be 455.23.04 beta - I haven’t tried it and I see no reason to - 455.28 is close enough, and was the first non-beta driver)

I don’t know if a character buffer is involved, since mine skips every 180 lines, no matter if they contain 40 or 80 characters. Of course, I don’t know how they are stored internally, the code may store 80 characters no matter how much are printed to a given line. What verson of older driver would you like me to try? I can try with another monitor, but that’ll take a number of days.nvidia-bug-report.log.gz (210.3 KB)

Can someone affected by this problem please attach an nvidia-bug-report.log.gz? Also, if you’re using the old VGA 80x25 text mode, can you please try using a graphical framebuffer mode instead?

Apologies Aaron, haven’t seen your reply until today. I didn’t know if they’re still “legal” (all my old nvidia-bug-report.logs have been removed by the forum as viruses/abuse/something ;)), so here it is:
nvidia-bug-report.log.gz (1.1 MB)

(I’ve upgraded to Fedora 34 meanwhile, otherwise everything is the same and the issue remains)

ghpille, as you have a 2070, you can go as far back as 410.66 (or 430.26 if it’s a 2070 Super), as long as they still work with the oldest kernel you can easily get (Xorg we don’t care about for a text mode test ;)). If you have the time and skill to test older versions, I don’t see why not :)

I already tested the oldest non-beta driver I can go down to with this hardware (455.28), so I can’t go deeper :/

Thanks. Your bug report logs do confirm that they’re using legacy VGA text mode:

[    0.123955] Console: colour VGA+ 80x25

Can you please try configuring your bootloader to use a graphical framebuffer console?

Hi Aaron, I have tried several different modes running the kernel with vga=ask.

  • “VGA” modes (showing text resolutions like 80x25, 80x28, 80x30 etc.) get “stuck” like we described.

  • None of the “VESA” modes (showing pixel resolutions from 600x480 to 1920x1080) have this issue (I tried 8/16/32bpp in low and high resolutions).

Of course those modes go through unaccelerated vesafb+fbcon and are unusable in their own right (17 seconds to show “dmesg” right after boot! It actually makes the whole system start slower!) So not really a suitable work-around, considering you really only use the text console in emergency scenarios when something stopped working and you need to bring it back to life fast ;) For now I just need to remember that even in the worst case (see my OP), rmmod nvidia-drm fixes the console in those situations.

I also found your post from 2017 saying Nvidia doesn’t really support anything but 80x25 outside of UEFI, which, as you can tell, we’re absolutely happy with :) (NVIDIA devs: any ETA on FBDEV (console mode setting) implementation? - #11 by aplattner) Hopefully this means we can expect 80x25 to get fixed :)

Thanks for confirming.

I think for non-UEFI mode the slow vesafb console is a result of the way the kernel maps the framebuffer console. You can generally improve it dramatically by passing video=vesafb:mtrr:3 on the kernel command line. Please give that a try to see if it makes it usable.

You’re right that the advice for a long time was to avoid vesafb due to interaction problems with the driver. Those problems should be fixed these days, so please feel free to use a framebuffer console. I’m trying to dig up the details on the 80x25 text mode problem on your GPUs but I suspect that using a framebuffer console is going to be your best workaround.

Loading the nvidiafb kernel module before nvidia’s modules, didn’t improve matters, on the contrary, with it I couldn’t restore the console by switching to another and then back.
So I rebuilt my kernel with nvidiafb built-in (I don’t use initrd), which gave me a framebuffer device and the scrolling problem was solved.nvidia-bug-report2.log.gz (239.1 KB)
Thank you, Lamieur, for your accurate description of the problem, and thanks Aaron for the workaround.

I don’t think nvidiafb will do anything here – it only supports extremely old NVIDIA GPU architectures. However from your bug report log it looks like it’s correctly failing to initialize and falling back to vesafb, so I think this is actually working as expected anyway:

[    1.885623] nvidiafb: Device ID: 10de1f02 
[    1.885625] nvidiafb: unknown NV_ARCH
[    1.885634] vesafb: mode is 1920x1080x32, linelength=7680, pages=1
[    1.885636] vesafb: scrolling: redraw
[    1.885637] vesafb: Truecolor: size=8:8:8:8, shift=24:16:8:0
[    1.885642] vesafb: framebuffer at 0xf1000000, mapped to 0x0000000072f7edf9, using 16200k, total 16384k
[    1.909058] Console: switching to colour frame buffer device 240x67
[    1.930833] fb0: VESA VGA frame buffer device

I encountered with the same problem after upgrade from GT 640 to GTX 1660 Super. Linux 5.18.15, Nvidia driver 515.57.

One character cell in text mode takes 2 bytes: char code and attribute, so i guess there is limit 8192 bytes somewhere.

As a workaround ‘no-scroll’ parameter can be passed to the kernel but it disables scroll buffer completely.

Another observation. I boot system without X server autostart and nvidia modules blacklisted. I login, run sudo modprobe nvidia-drm modeset=1 and continue pressing Enter. Often (but not always) VT console remains active, monitor doesn’t go to powersave mode, but scrolling becomes broken. Looks like a race between Nvidia driver and VGA console driver for save/restore functionality: Nvidia driver blanks console but VGA driver doesn’t know about that. VGA driver assumes console is not blanked and saves its registers state as non-blanked. After restoring that state console remains blank.

Wow, this is… creepy!

I mean, soft scrolling has been removed in 5.9, but hard scrolling (in the actual text console, not fbdev or whatever), which was connected to this issue (as that actually uses video memory as the buffer and just points where to start drawing from) has also been removed around 5.17, no?

I’m on 5.19.7, so under these (unfortunate) circumstances (I really really liked my hard-scroll and wished to have it when 5.19 came out with dmraid preventing shutdowns and me trying out SysRq P/T to figure out what’s it hanging on, but not being able to scroll up), because there’s no scrolling anyways, “no-scroll” should be a no-op, not needed, right?

TL;DR: yes, “no-scroll” definitely works as a work-around, but considering there’s no scrolling in the console anyways nowadays… why is that needed?

Anyways, thank you very much, it works on 5.19 and nvidia driver 515.65.01.

I wish someone would resurrect hard scrolling, or even soft scrolling for the text console, but this work-around is better than the text consoles “hanging” right on startup. Thanks again!