2 same Quadro P1000 cards, but only one can install Ubuntu.

We tested 2 NVIDIA Quadro P1000 card (both IDs are 10de:1cb1) on a Lenovo ThinkStation P330 Tiny, with Ubuntu 16.04, but we can only install Ubuntu successfully with one card (good), the other can not (bad).

the details are:

  1. can install Ubuntu 16.04 successfully with the good card.
  2. can NOT install Ubuntu 16.04 successfully with the bad card, system will hangs with the Ubuntu’s Plymouth boot screen.
  3. after Ubuntu 16.04 installed successfully with the good card, then installed the Nvidia-390 driver in Ubuntu, then power off the computer, then switch the good card to bad card. then power on the computer, the system seems can work fine as before.

I run nvidia-bug-report.sh in both cases and attached the logs (nvidia-bug-report.good.log.gz and nvidia-bug-report.bad.log.gz).

Could someone help to find what’s the differences of those 2 cards?

Thanks.

nvidia-bug-report.good.log.gz (110 KB)
nvidia-bug-report.bad.log.gz (108 KB)

Apart from the serial number, both cards are identical. During install, the nouveau driver is probably active, so it’s a question why nouveau reacts differently.

Yes, during the install, it is the driver nouveau to be active. below is the investigation summary:

compared the output of lspci, they are same.
compared the version of VBIOS, they are same.
at the installing stage, the system uses open source driver nouveau
after adding “nomodeset”, both good card and bad card will hang.
if let system enter initrd, both good card and bad card don’t hang.
with the bad card, when system start xorg (load xserver-xorg-video-nouveau), the system hang.
with the good card, install the system, then install nvidia graphic driver, and replace the good card with bad card, the system will not hang anymore.

So it looks like the nouveau driver for Xorg hangs. With the same driver and same hardware and software environment, why the bad card hangs while the good card doesn’t hang. It is highly possible that the good card and the bad card has something different.

Is there a way to let us find what is the difference between these two cards?

S/N of the good card: 0421218028832

S/N of the bad card: 0421218029751

Thanks for reporting this issue.

Even when two graphics cards have identical capabilities and VBIOSes, there may be configuration differences between them. E.g., both cards may have, say, 3 of a particular computational unit. But, one card might enumerate them as 0, 1, 3 and the other card might enumerate them as 0, 2, 3 (the indexing may be non-contiguous, even though both cards have the same number of units).

It appears that the Nouveau driver wasn’t expecting this particular instance of non-contiguous numbering on some Quadro P1000 cards. Thankfully, the problem is already fixed in more recent versions of Nouveau: in my testing with Linux kernel 4.18.5, the issue appears to be resolved (I didn’t search further to find the oldest kernel version that works in this case).

Unfortunately, the only work around that I’m aware of is to avoid using problematic versions of Nouveau with graphics cards that violate its assumptions. If you encounter a problem as described in post #1:

(1) Either temporarily swap in a different graphics card during distro installation, or install the distro in non-graphical mode.
(2) After distro installation, before enabling X11: either upgrade to a recent kernel (e.g., 4.18.5) or install the NVIDIA proprietary driver.

I’m sorry there isn’t a more convenient work around in this case.

If you test Linux kernel 4.18.5 and find that the problem is not resolved, please let me know.

Thanks for the explanation.

I tried some version of Linux kernels, I can reproduce this issue in Linux 4.17.19, and it seems the first version fixed this issue is Linux 4.18.0.