Nvidia Driver 418 not loading on Ubuntu 18.04.3

I had previously installed nvidia drivers on my 1070 alienware laptop with Ubuntu 18.04 . Suddenly it stopped working. Then I freshly installed Ubuntu and installed Nvidia drivers like this:
sudo apt purge nvidia-*
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update

   sudo apt install nvidia-driver-418

Then I couldn’t login. Again I reinstalled with same procedure, but additionally, did this -

  • added ‘nogpumanager’ kernel parameter

  • create /etc/X11/xorg.conf

    Section “Device”
    Identifier “intel”
    Driver “modesetting”
    BusID “PCI:0:2:0”
    EndSection

Still no luck.

Then did this:
sudo grep nvidia /etc/modprobe.d/* /lib/modprobe.d/*
And removed a line blacklisting nvidia from blacklist-framebuffer.conf.
Still no luck.

I then generated error report, and I’m attaching here - https://drive.google.com/file/d/16LjY53ggVhpagBPMO9XkmFJcrDzUG5D2/view?usp=sharing

The way I’m checking is using this:
nvidia-smi
No devices were found

Please let me know what has happened and how I can rectify. Thanks so much.

Not looking good:

[   18.476334] ACPI Error: Field [TMPB] at bit offset/length 1572864/32768 exceeds size of target Buffer (262144 bits) (20181213/dsopcode-201)
[   18.476340] 
               Initialized Local Variables for Method [_ROM]:
[   18.476340]   Local0: 000000007c8f9811 <Obj>           Integer 0000000000030000
[   18.476344]   Local1: 00000000a34eed6d <Obj>           Integer 0000000000001000
[   18.476345]   Local2: 000000005e0402a0 <Obj>           Integer 0000000000180000
[   18.476347]   Local3: 000000007da1c11f <Obj>           Integer 0000000000008000
[   18.476349] Initialized Arguments for Method [_ROM]:  (2 arguments defined for method invocation)
[   18.476349]   Arg0:   00000000ba19d376 <Obj>           Integer 0000000000030000
[   18.476351]   Arg1:   00000000c950ad6d <Obj>           Integer 0000000000001000
[   18.476353] ACPI Error: Method parse/execution failed \_SB.PCI0.PEG0.PEGP._ROM, AE_AML_BUFFER_LIMIT (20181213/psparse-531)
[   18.476365] NVRM: GPU 0000:01:00.0: Failed to copy vbios to system memory.
[   18.476497] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x30:0xffff:707)
[   18.476575] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

Might be defective hardware or a broken bios. Since your bios is quite old, please start by updating it.

Hi Thanks for the answer. I’ll try to update the bios. Meanwhile, is there any technique to check integrity of the hardware? When you say hardware could be defective, are you talking about GPU? I suspect that could be the case as in my previous installation, everything was running smoothly, and one fine day, GPUs were not detected. Please let me know how I can check integrity of hardware.

Thanks so much for your time and insights.

In your specific case, this might be a defective system bios flash rom or just some discharged cells which a reflash should fix. For such low-level issues, there’s no software to check since for any such software the drivers have to be loaded which doesn’t work in case of low-level failures.

Thank you. I understand. I did upgrade bios. But still no luck. I followed the same procedure as earlier. I’m attaching new log. If it’s hardware defect, any idea what I can do?

I’m not really sure anymore, might also just some bios bug that now surfaced with a newer kernel. Can you check if downgrading the kernel to e.g. 4.15 and driver to 390 yields the same result?
Please also run
sudo acpidump >acpidum.txt
and attach the output file.

Hi,

I am writing here because it is related to this topic, and didn’t want to duplicate thread. Maybe this is related also to: https://devtalk.nvidia.com/default/topic/1042561/-dev-sdb1-clean-640729-122388848-hellip-and-keyboard-is-not-working/, how message is same.

I have 2080Ti and just installed fresh Ubuntu 18.04 (kernel version is 5.0.0-27-generic). After i install nvidia drivers which i downloaded from nvidia website (NVIDIA-Linux-x86_64-430.50.run) my Ubuntu stop working. After restart my computer i am getting black screen with message “/dev/sdb2: clean … files …”, and never come to login screen.

Also i tried to follow this tutorial for installation nvidia drivers and cuda: https://www.tensorflow.org/install/gpu, but also after i install nvidia drivers (in this case 418) and reboot, Ubuntu stack with same message which i wrote above.

It looks like that there is some incompatibility between newest nvidia drivers and fresh Ubuntu? Anyway i cannot find solution for this problem, only to remove nvidia-drivers but what then?

@info3p8ha Did downgrading kernel to 4.15 help?

Thanks

@generix,
Thanks for your suggestion and efforts to help. Actually, I didn’t try out your suggestion to downgrade kernel/install driver 390, as I had some work, and needed my machine for other work. So I reinstalled plain ubuntu and ran acpidump. Not sure if it’s useful to diagnose anything, but I’m sharing anyway.

https://drive.google.com/file/d/1Yrq0KE9284YQYFm-6Eqc6hgitTu6eIlA/view?usp=sharing

@gruja90,
I can understand frustration… Did you try this?
From https://devtalk.nvidia.com/default/topic/1043405/linux/ubuntu-18-04-headless_390-intel-igpu-after-prime-select-intel-lost-contact-to-geforce-1050ti/

  • sudo prime-select nvidia

  • add ‘nogpumanager’ kernel parameter

  • create /etc/X11/xorg.conf

    Section “Device”
    Identifier “intel”
    Driver “modesetting”
    BusID “PCI:0:2:0”
    EndSection

  • reboot

@generix,
In fresh installation of ubuntu, when I run this -
nvidia-detector, I get
none
But lspci shows
01:00.0 VGA compatible controller: NVIDIA Corporation GP104M [GeForce GTX 1070 Mobile] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)

Also, when I did hardware diagnostics during BIOS startup, it said “Video Card” is fine. You had mentioned there was no way we could see if we check if hardware is intact. My question and my worry is hardware now. I’m still requesting you again to let me know, is there anyway using which I can check if hardware is intact. Because it’s not even one year since I purchased this laptop, so I have warranty still. Any pointers will be helpful.

The easiest method is to install Windows to rule out a kernel/driver bug. For an RMA, the manufacturer will probably request that anyway.

Thanks generix. I did try with Ubuntu 16.04 LTS. I did this:
sudo apt purge nvidia-*
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update

sudo apt install nvidia-384

After that, Nvidia-smi is showing ACPI Error: Field [TMPB] at bit offset/length 1572864/32768 exceeds size of target Buffer…

Can we now conclude that it is some hardware issue?

Looking at the related acpi code, this only expects values of Arg0 either <0x30000 or >0x30000. In your case, it is called with Arg0=0x30000 so it doesn’t work. The question remains, where the value of 0x30000 comes from. Either the gpu is broken or the mainboard has to be reset, like here: https://devtalk.nvidia.com/default/topic/1062791/linux/mx150-graphics-clock-suddenly-stuck-at-427mhz-with-drivers-390-87-and-430-26-/post/5382072/#5382072
A kernel/driver bug can be safely denied now by testing with the 16.04 setup showing the same issue.
Ultimately, you will have to install Windows to test since Alienware doesn’t support Linux so will not open any support case with it.

Hi,

After 1.5 working day spent in research how to fix problem with ubuntu and nvidia drivers, and after i try everything i found without success, i finally found solution. I install ubuntu 18.04LTS instead ubuntu 18.04.3LTS, and now everything working fine. It looks like that this newest version of ubuntu has some problem with nvidia drivers. I think that reason for that is because 18.04.3LTS comming with some kind of open source nvidia drivers, and previous version 18.04LTS not. So in 18.04LTS there is nothing what can make confusion for computer which driver to use. Or can be something in kernel, as i see 18.04.3LTS using kernel-5.0.27 but 18.04LTS using kernel-4.15.0.

I followed different tutorials how to try to disable this open source drivers on these links:

Thanks @generix and @info3p8ha anyway!