Bad initialisation, snow/noise and black/blank screens with Quattro K1000M, W530, DisplayPort hub

Hello,

I have a three monitor setup with a Club 3D MST hub connected to the displayport on my Lenovo Thinkpad W530 and they don’t display right.

The GPU seems to have a hard time initialising and maintaining a consistent connection to the screens. I have tried a multitude of linux drivers, but none solve the problem. It only changes the behaviour of the problem a bit.

When logging into X, a random permutation of the three screens turn on, only become black/blank, have noise on them or simply don’t turn on. Xrandr reports all three screens are present and also states they are enabled. The same is true for the NVidia control panel.
I have to do one of these things:

  • Repeat logging out (sometimes reloading the NVidia module, rebooting or cold booting) and logging back in to be able to turn them all on.
  • Unplug the cable from my laptop and plug it back in twice (with the 331.20 driver)
  • Unplug and plug it back in and change to another TTY and change back to X.
    I have to / can repeat these processes until all the screen are displaying again without noise.

A solution can work or not, when repeated for a while. When repeated too often, something the system/kernel/X simply freezes/crashes, without even being able to panic or oops.

When I have succeeded in having a desktop with three monitors displaying, some of them can display white noise/snow and some might turn off and on (after seconds to minutes) after a while or turn off and never turn on again. The randomly turning off or becoming blank seems to occur more often when that screen had noise on it.

Switching to the TTY and back to X changes the situation, but in most cases only changes the state of the problem. And doesn’t change any more if repeated.

(The only driver I found able to detect hot-plugging of the screens was the 331.20 versions and this version is masked in portage.)

I have connected Club 3D, these problems point at a low bandwidth problem, but according to the specs of my videocard/laptop this should not be problem and the setup can work fine for quite some time. Changing the power settings in the control panel to Prefer maximum performance seems to reduce the occurrence of the problems, but they remain present.

Mode switching from a TTY to X with the NVidia driver takes up to 10-30 seconds.

The only things I can find in the syslog:

  • a ACPI warning when loading the nvidia module “ACPI Warning: _SB_.PCI0.PEG_.VID_._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130725/nsarguments-95)”
  • a thinkpad driver issue when changing TTY: “thinkpad_acpi: asked for hotkey mask 0x0070ffbf, but firmware forced it to 0x0070ffbb”

Cheers,

leipie
nvidia-bug-report.log.gz (327 KB)

Hi,

If you want any help from Nvidia officials, you should include the nvidia-bug-report: https://devtalk.nvidia.com/default/topic/522835/linux/if-you-have-a-problem-please-read-this-first/

Also interesting to know would be your distribution, kernel, and the version of the nvidia-drivers you are using.

By the way, i have the same laptop as you have (though i don’t have three screens or a hub), and without using horrible workarounds, my screen stays entirely black, so i am actually very interested in why your laptop works at all.

While this is discussed in Another Thread, i am getting exactly the same ACPI errors as you (actually, a few more). My theory is, that the nvidia-driver simply does not work with some Optimus-hardware.

Hey,

I use Gentoo x64 with KDE. Currently I’m using the 3.12.3-tuxonice kernel and I have Optimus enabled. My current nvidia driver version is 319.76 and I have used versions up to 331.20.
But every version above 304.117 is masked in my portage package manager, because it tends to crash kded4 daemon process of KDE every hours or so. These nvidia drivers are very unsatisfying.

I am unable to switch to the nouveau drivers, because they don’t recognise daisy-chained displays.

With my current set-up I have two scripts I wrote to use my screens, but it involves restarting X to switch between the laptop screen and the external displays.

I have included the scripts and my custom xorg configs. To switch I execute:
nohup sudo /root/switch.sh intel ; exit
nohup sudo /root/switch.sh nouveau ; exit
nohup sudo /root/switch.sh nvidia ; exit

leipie

P.s. Who wrote this stupid message? “File has an invalid extension, it should be one of avi, bmp, chm, cpp, cu, doc, gif, gsl, gz, h, htm, html, ico, jpeg, jpg, jps, mov, mp3, mpg, pdf, png, pns, ppt, pyg, rar, rib, rtf, swf, tar, tif, tiff, txt, wav, wmv, xls, xml, zip, log, xlsx, docx, pptx, 7z.” It is wrong, it should state the extension is not accepted. The .sh extension is the only valid extension for the scripts I’m trying the include…

I seem unable to attach much to these messages, I have tried to add my archive several times now and it either ignore my request or stops at 100%. The scripts archive is available at leipie.info/scripts.tgz

Sorry for letting you wait so long.

For your “main problem” of screens not turning on correctly, could you please give me the last lines of dmesg right before the problems appear?

I’m looking for something like

Jan 18 23:53:06 vash kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  331.38  Wed Jan  8 19:32:30 PST 2014
Jan 18 23:53:13 vash kernel: NVRM: GPU at 0000:01:00.0 has fallen off the bus.
Jan 18 23:53:13 vash kernel: NVRM: os_pci_init_handle: invalid context!
Jan 18 23:53:13 vash kernel: NVRM: os_pci_init_handle: invalid context!
Jan 18 23:53:13 vash kernel: NVRM: GPU at 0000:01:00.0 has fallen off the bus.
Jan 18 23:53:13 vash kernel: NVRM: os_pci_init_handle: invalid context!
Jan 18 23:53:13 vash kernel: NVRM: os_pci_init_handle: invalid context!
Jan 18 23:53:13 vash kernel: NVRM: RmInitAdapter failed! (0x25:0x28:1156)
Jan 18 23:53:13 vash kernel: NVRM: rm_init_adapter failed for device bearing minor number 0
Jan 18 23:53:13 vash kernel: NVRM: nvidia_frontend_open: minor 0, module->open() failed, error -5

It doesn’t take quite that long for me, but i mostly only use the internal 1920x1080 LCD. The nvidia driver is really slow with operations like those though, so maybe it’s “normal” for three screens?

I’m not sure if they are related. I am getting the ACPI Warnings as well, but my graphics are working right now (even if it’s because of a workaround). I also get thinkpad_acpi warnings, but i am not encountering any issues with switching TTYs.

Hey!

I have none of those error messages. This is the only output in the dmesg log, I don’t think it has anything useful to add:
“[118256.307888] bbswitch: enabling discrete graphics
[118256.539653] pci 0000:01:00.0: power state changed by ACPI to D0
[118256.539655] thinkpad_acpi: EC reports that Thermal Table has changed
[118256.550987] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=none,decodes=none:owns=none
[118256.551097] [drm] Initialized nvidia-drm 0.0.0 20130102 for 0000:01:00.0 on minor 1
[118256.551100] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 331.38 Wed Jan 8 19:32:30 PST 2014
[118256.552466] nvidia 0000:01:00.0: irq 47 for MSI/MSI-X
[118259.037881] ACPI Error: Field [TBF3] at 389120 exceeds Buffer [NULL] size 368640 (bits) (20130328/dsopcode-236)
[118259.037888] ACPI Error: Method parse/execution failed [_SB_.PCI0.PEG_.VID_.GETB] (Node ffff88081a490668), AE_AML_BUFFER_LIMIT (20130328/psparse-537)
[118259.037897] ACPI Error: Method parse/execution failed [_SB_.PCI0.PEG_.VID_._ROM] (Node ffff88081a490640), AE_AML_BUFFER_LIMIT (20130328/psparse-537)
[118272.021683] thinkpad_acpi: asked for hotkey mask 0x0070ffbf, but firmware forced it to 0x0070ffbb
[119952.168357] nvidia-settings[22271]: segfault at 7f938a2959b0 ip 00007f938a2959b0 sp 00007fffd534ff58 error 15 in SYSV00000000 (deleted)[7f938a24d000+60000]”

The only message that might be similar, I can only trigger when I power down the nvidia card, through bbswitch and then load the nvidia module. Those messages are present in the full bug report archive. But this has nothing to do with the screen problems.

I’ll attach a new bug report too, if uploading succeeds.

The internal screen can only be driven through the intel (or modesetting) driver, because it is claimed by the intel framebuffer due to optimus. So it doesn’t have much to do with nvidia driver.

Thank you for your effort,

leipie
nvidia-bug-report.log.gz (432 KB)

for your information the output of xrandr:

Screen 0: minimum 8 x 8, current 3240 x 1920, maximum 16384 x 16384
VGA-0 disconnected (normal left inverted right x axis y axis)
DP-3.1 connected primary 1080x1920+0+0 left (normal left inverted right x axis y axis) 480mm x 270mm
1920x1080 60.0*+
1680x1050 60.0
1600x900 60.0
1280x1024 75.0 60.0
1280x800 59.8
1280x720 60.0
1024x768 75.0 60.0
800x600 75.0 60.3
640x480 75.0 59.9
DP-3.2 connected 1080x1920+1080+0 left (normal left inverted right x axis y axis) 480mm x 270mm
1920x1080 60.0*+
1680x1050 60.0
1600x900 60.0
1280x1024 75.0 60.0
1280x800 59.8
1280x720 60.0
1024x768 75.0 60.0
800x600 75.0 60.3
640x480 75.0 59.9
DP-3.3 connected 1080x1920+2160+0 left (normal left inverted right x axis y axis) 480mm x 270mm
1920x1080 60.0*+
1680x1050 60.0
1600x900 60.0
1280x1024 75.0 60.0
1280x800 59.8
1280x720 60.0
1024x768 75.0 60.0
800x600 75.0 60.3
640x480 75.0 59.9
LVDS-0 disconnected (normal left inverted right x axis y axis)
DP-0 disconnected (normal left inverted right x axis y axis)
DP-1 disconnected (normal left inverted right x axis y axis)
DP-2 disconnected (normal left inverted right x axis y axis)
DP-3 disconnected (normal left inverted right x axis y axis)
DP-4 disconnected (normal left inverted right x axis y axis)
DP-5 disconnected (normal left inverted right x axis y axis)

Huh, strange.

Since your problem seems unrelated to mine, i’m afraid i cannot help you further…

But just as a side-note, you can “disable” Optimus in your BIOS settings (actually UEFI settings), so the Quadro Card drives the internal Screen as well. Maybe this fixes the long tty-switch-times?

I will try, but disabling Optimus is not really an option, because I would need to restart my computer all the time. It is bad enough I have to restart X every time I go off/on cable.

To nvidia drivers is unable to throttle the power usage, when in non-optimus mode and my battery will drain within 1:30.

Hey!

If I enable only the discrete card, the switching is faster, but brightness control is broken. The only thing I can find is github.com/guillaumezin/nvidiabl, but this involves setting the bright from the command-line for now.

The connection to the screens seems more stable, but I have seen a blink one of my screens already. It seems it doesn’t completely solve the display issues. On/off cable can now be done without restarting X off course.
The nvidia card, still drains a lot of power and I am unable to force and keep the card in a low power state. Every time I move the cursor the driver seems to ramp up the power.

Cheers,

leipie

Backlight can also be set with xbacklight.

I cannot help you with the power issues here. Howewer, my machine usually lasts 4 hours on battery, nvidia card enabled.

xbacklight doesn’t seem to work for me. Do I need to set something up first?

No idea, i didn’t have to do anything. Maybe different compile options?

By the way, what is your kernel’s CONFIG_HZ?