Hi all,
I need some suggestions: I’m stuck with what to check next on a system that’s dropped it’s card, I would rather not reinstall the OS which is about all I’ve got left.
Ubuntu 20.04 LTS system was running successfully for 2 weeks but now seems to have lost the card.
Dmesg states (again and again):
[ 2242.792893] nvidia-nvlink: Nvlink Core is being initialized, major device number 509
[ 2242.793850] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=none:owns=none
[ 2242.793867] NVRM: The NVIDIA GPU 0000:01:00.0
NVRM: (PCI ID: 10de:1b06) installed in this system has
NVRM: fallen off the bus and is not responding to commands.
[ 2242.793907] nvidia: probe of 0000:01:00.0 failed with error -1
[ 2242.793917] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 2242.793917] NVRM: None of the NVIDIA devices were initialized.
[ 2242.794110] nvidia-nvlink: Unregistered the Nvlink Core, major device number 509
lspci lists sensible outputs but does NOT list a driver in use:
01:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1) (prog-if 00 [VGA controller])
Subsystem: ASUSTeK Computer Inc. GP102 [GeForce GTX 1080 Ti]
Flags: fast devsel, IRQ 16
Memory at a3000000 (32-bit, non-prefetchable) [virtual] [size=16M]
Memory at 90000000 (64-bit, prefetchable) [virtual] [size=256M]
Memory at a0000000 (64-bit, prefetchable) [virtual] [size=32M]
I/O ports at 4000 [virtual] [size=128]
Expansion ROM at a4000000 [virtual] [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [128] Power Budgeting
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024
Capabilities: [900] Secondary PCI Express
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
nvidia-detector claims a 510 is appropriate:
nvidia-driver-510
I’ve tried to set the driver through the Additional drivers menu, it claims to be using 510.
I’ve checked for blacklists and found only the framebuffer listed:
grep -ri nvidia /etc/modprobe.d/
/etc/modprobe.d/blacklist-framebuffer.conf:blacklist nvidiafb
yet modprobe says:
modprobe nvidia
modprobe: ERROR: could not insert ‘nvidia’: No such device
It’s been rebooted, booted and punched, but it still doesn’t want to claim the card.
All suggestions gratefully received,
M