black screen with Mac version of GTX 680

I wonder how many of the similar failures in [url]https://devtalk.nvidia.com/default/topic/1037997/xid-61-black-screen-on-startup-ubuntu-18-04-gtx-1060-mobile/?offset=9[/url] are likewise due to hardware that doesn’t support loading the ipmi_si module. Is there any chance the newer nvidia drivers can be reworked to provide a legacy mode for hardware that doesn’t support ipmi?

I am really confused about the relationship of ipmi to nvidia_drm. At work, we have System76 Leopard WS workstation with a GeForce GTX 1050/PCIe/SSE2 running nvidia-384-384.130-0ubuntu0.16.04.1 under 4.15.0-36-generic on Xenial Ubuntu. In that case, the modules.dep shows the line…

updates/dkms/nvidia_384_drm.ko: updates/dkms/nvidia_384_modeset.ko updates/dkms/nvidia_384.ko kernel/drivers/gpu/drm/drm_kms_helper.ko kernel/drivers/gpu/drm/drm.ko kernel/drivers/video/fbdev/core/fb_sys_fops.ko kernel/drivers/video/fbdev/core/syscopyarea.ko kernel/drivers/video/fbdev/core/sysfillrect.ko kernel/drivers/video/fbdev/core/sysimgblt.ko

Their nvidia-graphics-drivers.conf in /etc/modprobe.d has…

blacklist nouveau
blacklist lbm-nouveau
blacklist nvidia-current
blacklist nvidia-173
blacklist nvidia-96
blacklist nvidia-current-updates
blacklist nvidia-173-updates
blacklist nvidia-96-updates
blacklist nvidia-384-updates
alias nvidia nvidia_384
alias nvidia-uvm nvidia_384_uvm
alias nvidia-modeset nvidia_384_modeset
alias nvidia-drm nvidia_384_drm
alias nouveau off
alias lbm-nouveau off

options nvidia_384_drm modeset=0

and a vmwgfx-fbdev.conf with

options vmwgfx enable_fbdev=1

Why would nvidia_drm get a dependency on the ipmi modules on a MacPro but not on a System76 PC?

Don’t waste time on looking into ipmi, it’s not a dependency, the nvidia driver just uses it if it’s there.
[url]https://cateee.net/lkddb/web-lkddb/IPMI_HANDLER.html[/url]

I think your wrong there. The ipmi support is broken on Macs. Looking at the nvidia-bug-report-340.107.log generated for a working nvidia 340.107 installation under Fedora 28 I see no uses of it…

$ grep ipmi nvidia-bug-report-340.107.log
$

but for the broken 410.57 installation, I do…

s$ grep ipmi nvidia-bug-report.log-410.57-nvidia-drm.modeset_is_0-gdm_wayland_off
[ 17.235883] ipmi message handler version 39.2
[ 17.824365] ipmi device interface
ipmi_devintf 20480 0 - Live 0xffffffffc04e8000
ipmi_msghandler 69632 2 nvidia,ipmi_devintf, Live 0xffffffffc03b4000
RJznIH2nTVQ4XnsV4fi4+l+T33Z5Up888GsgzBtlGL2ph0K+pc1vrUwpowq0mGZmMs0IipmixBVP

Would simply blacklisting the ipmi* modules be sufficient to keep nvidia_drm from trying to load them?

Note that on Macs, that loading the ipmi_si.ko failes to create any ipmi devices. Manually creating them for the right node number doesn’t allow the ipmi support to work. So it appears to be totally broken on Macs under linux. Perhaps they assumed it might work because ipmi does work under darwin.

I am also a tad confused about the ipmi module. Will nvidia_drm.ko only use the ipmi support if those modules are already loaded or will it actually load them if available? The modules.dep on my Mac has ipmi_msghandler.ko as a dependency for nvidia_drm.ko. I am assuming the best strategy here is to

modprobe -r ipmi_msghandler.ko

and see if any other modules loaded that. I assume if I blacklist the impi modules in modprobes.d that I will need to regenerate the kernel modules to purge out the ipmi depenedencies from modules.dep, right?

ipmi_msghandler gets loaded 17secs afterboot, nvidia driver 21secs after boot.
Nvidia does not load ipmi, it just uses it if it is already loaded.
You can use something like
alias ipmi_msghandler off
to get rid of it.

Taken a look at the driver glue code, in kernel/nvidia/os-interface.c ipmi support is conditionally compiled in if the kernel supports it.
Nevertheless, it’s a just sensor interface.

But might that not be enough to destabilize the driver? After all, drivers aren’t usually written to interface with broken modules which are going to return weird errors, no? I would assume that is an untested corner case.

The ipmi module behavior is truly bizarre. Adding the blacklisting as…

$ more /etc/modprobe.d/nvidia.conf
alias ipmi_devintf off
alias ipmi_msghandler off
alias ipmi_si off
alias ipmi_watchdog off
alias ipmi_poweroff off
alias acpi_ipmi off
alias ibmaem off
alias ibmpex off

works on the command line…

$ sudo modprobe ipmi_msghandler
modprobe: ERROR: could not find module by name=‘off’
modprobe: ERROR: could not insert ‘off’: Unknown symbol in module, or unknown parameter (see dmesg)

but when I install the rpmfusion 410.57 packages and reboot, I find that the ipmi_msghandler is loaded…

$ lsmod | grep nvidia
nvidia_drm 49152 1
nvidia_modeset 1044480 2 nvidia_drm
nvidia_uvm 925696 0
nvidia 16855040 49 nvidia_uvm,nvidia_modeset
drm_kms_helper 196608 1 nvidia_drm
drm 475136 4 drm_kms_helper,nvidia_drm
ipmi_msghandler 69632 1 nvidia

which weirdly looks like it loaded the nvidia driver. This is despite having used…

rd.driver.blacklist=ipmi_devintf rd.driver.blacklist=ipmi_msghandler modprobe.blacklist=ipmi_devintf modprobe.blacklist=ipmi_msghandler modprobe.blacklist=ipmi_si modprobe.blacklist=ipmi_watchdog modprobe.blacklist=ipmi_poweroff modprobe.blacklist=acpi_ipmi modprobe.blacklist=ibmaem modprobe.blacklist=ibmpex

on GRUB_CMDLINE_LINUX in /etc/default/grub and regenerating grub.cfg with…

sudo grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg

and then running ‘dracut -f’.

Note that I manually verified at the grub boot loader that these options were being passed.

I tried deleting the actual ipmi_msghandler.ko.xz and touching it as an empty file, but that prevents nvidia from loading…

more boot.log | grep nvidia

     Starting Fallback to nouveau as nvidia did not load...

[ OK ] Started Fallback to nouveau as nvidia did not load.

I keep noticing that in modules.dep there always are explicit dependencies on ipmi modules for nvidia…

$ grep nvidia modules.dep | grep ipmi
extra/nvidia/nvidia-modeset.ko: extra/nvidia/nvidia.ko kernel/drivers/char/ipmi/ipmi_msghandler.ko.xz
extra/nvidia/nvidia-drm.ko: extra/nvidia/nvidia-modeset.ko extra/nvidia/nvidia.ko kernel/drivers/gpu/drm/drm_kms_helper.ko.xz kernel/drivers/gpu/drm/drm.ko.xz kernel/drivers/char/ipmi/ipmi_msghandler.ko.xz
extra/nvidia/nvidia-uvm.ko: extra/nvidia/nvidia.ko kernel/drivers/char/ipmi/ipmi_msghandler.ko.xz
extra/nvidia/nvidia.ko: kernel/drivers/char/ipmi/ipmi_msghandler.ko.xz

The bit I don’t understand is why on my System76 machine with nvidia-384 and the ipmi modules, its modules.dep don’t have any dependencies on ipmi modules for the nvidia modules.

I tried installing th envidia-graphics-drivers-396 packages from the ppa:system76-dev/stable and those also still show the dependencies of nvidia-drm.ko on ipmi_msghandler.ko.

I plan on trying one last thing to see if I can fully decouple ipmi from the newer nvidia drivers on a MacPro. Building a local custom kernel package which has the ipmi kernel support disabled in its config.

I looked at the glue code for 384 and that one simply didn’t have any support for ipmi, seems to have been added later.

Okay. I dropped back to Ubuntu 16.04.5 xenial and installed their nvidia 384.130 which doesn’t produced the dependency on the ipmi modules in modules.dep. It exhibits the same black screen failure when nvidia-modeset is called. So ipmi was a red herring. The last thing I can think of is to try to find a distro with linux packaging for some of the intermediate nvidia releases like 343.6 and 346.22 to try to find the first driver release that regressed in supporting the MacPro. It looks like Fedora 21 has those in the rpmfusion archives.

FYI, I tried saving the attached xorg.conf from Ubuntu cosmic with a functional nvidia-340 installation. This xorg.conf was then transferred to a second drive with an identical Ubuntu cosmic installation but with nvidia 390.87. The same

[ 92.932200] nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.

failures. Is there anything useful to be gleaned from the nvidia-debugdump output that might explain the exact origin of these nvidia-modeset failures?

No, from my experience those ‘idling display engine timed out, Lost display notification’ errors right on start are not fixable. Based on observation, my speculation is that the vendor of the affected gpus has been using a feature (maybe in vbios, speculative) that is not any longer supported by the nvidia driver starting with a certain driver date.

I guess the remaining question is whether anyone here has been able to get a GTX 680 to work with the nvidia drivers beyond the 340 release series? I would assume that if such cards have become unsupported, this should be considered a regression in the drivers (or at least properly documented as no longer being supported).

I can just report I have a 2009 Mac Pro and an EVGA GTX 680 SC2 flashed with EFI firmware, and I experienced the same problem. I cannot install any NVIDIA driver over 340.107, I get a black screen and no login screen after boot.

I upgraded my card to an EVGA GTX 680 Classified (4 GB). I flashed it with EFI firmware to have a working boot logo and that stuff in macOS, but after that flash I got the black or frozen screen with the NVIDIA driver in Ubuntu (using any version over 340.107).

Then I found out the problem only occurs when I connect my 3 screens. If I connect 2 of them, I get the login at Ubuntu (but only 1 screen works, the second one is connected but receives no signal, and the third must be disconnected to avoid the black screen freezing problem). I will continue investigating trying to make the drivers work with my 3 screen setup.

OK, kind of weird but this is what I’ve found out:

If I boot Ubuntu with 2 of my 3 monitors connected, one attached to the HDMI port, the other to the lower DVI, I can login, but I get just one working screen (in the monitor attached to the lower DVI port).

But THEN, if after login I connect the third monitor to the upper DVI voilà, monitors 2 and 3 start working and everything is as I need it to be (working 430 driver and 3 functional screens).

When I boot with the three monitors connected (and I get the freeze/black screen), I SSHed to my Ubuntu and seen that X.org process tops at 100% CPU usage.

Now at least I have a workaround, and since my computer stands always powered on is not as bad as it seems.

OK. After a year of manually connecting the upper DVI after I log in in Ubuntu in order to avoid the black screen issue with my EFI flashed GTX680 @ Mac Pro, I finally managed to fix the problem.

The thing seems to be related with the autodetection of display settings (EDID), you have to manually declare screen configuration.

This is my recipe, hope it helps someone else:

  1. Login using my workaround (disconnect upper DVI port), and if you like reconnect it after login to get the extra displays to work.
  2. Open NVIDIA Settings (NVIDIA X Server Settings) tool that comes with the NVIDIA driver.
  3. Go to “nvidia-settings Configuration” and click “Save Current Configuration”. Save it in some place you have write privileges.
  4. Copy that generated xorg.conf file to /etc/X11/. In my case the file didn’t existed, but if you already have one overwrite it.
  5. Edit the generated file and manually declare your monitors. Google the horiz and vert refresh ranges for your specific model.

This is my config file for my 3 displays:

Section “Monitor”
# HorizSync source: edid, VertRefresh source: edid
Identifier “DVI-D-0”
VendorName “Unknown”
ModelName “Acer V276HL”
HorizSync 30.0 - 83.0
VertRefresh 50.0 - 76.0
Option “DPMS”
EndSection

Section “Monitor”
# HorizSync source: edid, VertRefresh source: edid
Identifier “HDMI-0”
VendorName “Unknown”
ModelName “ViewSonic VA2246”
HorizSync 24.0 - 82.0
VertRefresh 50.0 - 75.0
Option “DPMS”
EndSection

Section “Monitor”
# HorizSync source: edid, VertRefresh source: edid
Identifier “DVI-I-1”
VendorName “Unknown”
ModelName “ViewSonic VA2261”
HorizSync 24.0 - 82.0
VertRefresh 50.0 - 75.0
Option “DPMS”
EndSection

Section “Device”
Identifier “Device0”
Driver “nvidia”
VendorName “NVIDIA Corporation”
BoardName “GeForce GTX 680”
EndSection

Section “Screen”
Identifier “Screen0”
Device “Device0”
Monitor “DVI-I-1”
DefaultDepth 24
Option “Stereo” “0”
Option “nvidiaXineramaInfoOrder” “DFP-0”
Option “metamodes” “DVI-I-1: nvidia-auto-select +0+0, DVI-D-0: nvidia-auto-select +1920+0, HDMI-0: nvidia-auto-select +3840+0”
Option “SLI” “Off”
Option “MultiGPU” “Off”
Option “BaseMosaic” “off”
Option “UseDisplayDevice” “DFP”
SubSection “Display”
Depth 24
EndSubSection
EndSection

Add the: Option “UseDisplayDevice” “DFP”
in the Screen section (I don’t know if it really has an effect, but I read that it fixes DVI initialization problems with black screens over there).

  1. Reboot.

With this config I can boot my Mac Pro in Ubuntu, I get my 3 screens and no hangs or black screens with my EFI-flashed NVIDIA GTX680 4GB Classified!