nvidia-smi "No devices were found" error

Having the same problem with GTX 1650. Tried to run it on Debian 10 and on Ubuntu 20.
Debian 10, on standard and experimental kernel. 440 and 450 drivers. No luck.
Now I am trying to get it working on Ubuntu with drivers 450 included in Ubuntu repos. Still no luck. Everything was done on fresh installs. Below some outputs from commands:

lspci -vvv
> 22:00.0 VGA compatible controller: NVIDIA Corporation TU117 [GeForce GTX 1650] (rev a1) (prog-if 00 [VGA controller])

        Subsystem: Micro-Star International Co., Ltd. [MSI] TU117 [GeForce GTX 1650]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 175
        NUMA node: 1
        Region 0: Memory at d9000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: Memory at 3c000000000 (64-bit, prefetchable) [size=256M]
        Region 3: Memory at 3c010000000 (64-bit, prefetchable) [size=32M]
        Region 5: I/O ports at 6000 [size=128]
        Expansion ROM at daf00000 [virtual] [disabled] [size=512K]
        Capabilities: [60] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [78] Express (v2) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
                DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s (downgraded), Width x16 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range AB, TimeoutDis+, NROPrPrP-, LTR-
                         10BitTagComp-, 10BitTagReq-, OBFF Via message, ExtFmt-, EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
        Capabilities: [100 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [258 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=255us PortTPowerOnTime=10us
                L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                           T_CommonMode=0us LTR1.2_Threshold=0ns
                L1SubCtl2: T_PwrOn=10us
        Capabilities: [128 v1] Power Budgeting <?>
        Capabilities: [420 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- > MalfTLP- ECRC- UnsupReq- ACSViol-
            UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
            UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
            CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
            CEMsk:  RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ AdvNonFatalErr+
            AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
                    MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
            HeaderLog: 00000000 00000000 00000000 00000000
    Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
    Capabilities: [900 v1] Secondary PCI Express
            LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
            LaneErrStat: 0
    Capabilities: [bb0 v1] Resizable BAR <?>
    Kernel driver in use: nvidia
    Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

dmesg | grep NVRM

[ 5.865440] NVRM : loading NVIDIA UNIX x86_64 Kernel Module 450.66 Wed Aug 12 19:42:48 UTC 2020

[ 101.738202] NVRM : GPU 0000:22:00.0: RmInitAdapter failed! (0x26:0xffff:1266)

[ 101.738300] NVRM : GPU 0000:22:00.0: rm_init_adapter failed, device minor number 0

[ 885.973227] NVRM : GPU 0000:22:00.0: RmInitAdapter failed! (0x26:0xffff:1266)

[ 885.973319] NVRM : GPU 0000:22:00.0: rm_init_adapter failed, device minor number 0

[ 942.341436] NVRM : GPU 0000:22:00.0: RmInitAdapter failed! (0x26:0xffff:1266)

[ 942.341481] NVRM : GPU 0000:22:00.0: rm_init_adapter failed, device minor number 0

[ 1403.988729] NVRM : GPU 0000:22:00.0: RmInitAdapter failed! (0x26:0xffff:1266)

[ 1403.988775] NVRM : GPU 0000:22:00.0: rm_init_adapter failed, device minor number 0

[ 1419.358018] NVRM : GPU 0000:22:00.0: RmInitAdapter failed! (0x26:0xffff:1266)

[ 1419.358105] NVRM : GPU 0000:22:00.0: rm_init_adapter failed, device minor number 0

lsmod | grep nvidia

nvidia_uvm 1007616 0
nvidia_drm 53248 0
nvidia_modeset 1183744 1 nvidia_drm
nvidia 19701760 2 nvidia_uvm,nvidia_modeset
drm_kms_helper 184320 4 mgag200,nvidia_drm
drm 491520 6 drm_kms_helper,drm_vram_helper,mgag200,nvidia_drm,ttm

debian_dell_440.log.gz (554.8 KB)
ubuntu_dell_450.log.gz (108.6 KB)

Well I kind of fixed it
By tapping on the back of the laptop

All of a sudden started having the issue after a reboot about a week ago as well, using Ubuntu 20.04.

Things i’ve tried so far:

  • Upgrading Ubuntu 20.04
  • Upgrading/Downgrading NVIDIA driver via run file (450.66 / 455.38 / 440.100)
  • Nouveau driver is blacklisted, i tried running another initramfs to update the kernel.

I then purged everything e.g

sudo apt-get remove nvidia-* xserver-xorg-* && sudo apt-get purge nvidia-* xserver-xorg-* && sudo apt-get autoclean && sudo apt-get autoremove

Shutdown, and tested the card in a different system where it ran perfectly. Loaded it up to 100% for 2 hours and completed tasks fine. I then put the card back into my Ubuntu Server and reinstalled, this time via aptitude

sudo apt-get install dkms build-essential linux-headers-$(uname -r)
sudo nano /etc/modprobe.d/blacklist.conf
blacklist nouveau
blacklist nvidiafb
alias nouveau off
echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf
sudo apt-get install nvidia-headless-450-server nvidia-utils-450-server nvidia-container-runtime nvidia-container-toolkit nvidia-docker2

Still have the same issue

$ lspci | grep NVIDIA
07:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] (rev a1)
07:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)

$ dmesg | grep NVRM
[ 1.569623] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 450.80.02 Wed Sep 23 01:13:39 UTC 2020
[ 65.390214] NVRM: GPU 0000:07:00.0: RmInitAdapter failed! (0x26:0xffff:1266)
[ 65.390403] NVRM: GPU 0000:07:00.0: rm_init_adapter failed, device minor number 0

$ dmesg | grep NVIDIA
[ 1.450276] nvidia: module license ‘NVIDIA’ taints kernel.
[ 1.569623] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 450.80.02 Wed Sep 23 01:13:39 UTC 2020
[ 1.572949] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 450.80.02 Wed Sep 23 00:48:09 UTC 2020

$ cat /var/log/kern.log | grep taint
Nov 1 14:13:49 mediabox kernel: [ 1.450269] nvidia: loading out-of-tree module taints kernel.
Nov 1 14:13:49 mediabox kernel: [ 1.450276] nvidia: module license ‘NVIDIA’ taints kernel.
Nov 1 14:13:49 mediabox kernel: [ 1.450278] Disabling lock debugging due to kernel taint
Nov 1 14:13:49 mediabox kernel: [ 1.460693] nvidia: module verification failed: signature and/or required key missing - tainting kernel

$ sudo nvidia-smi
No devices were found

Dear All

I have encountered the same problem: Ubuntu 20.04/RTX2060/Cuda-11.2/nvidia460.32.03

I have also tried Cuda-10.1 but with the same result

$ sudo nvidia-smi
No devices were found

$ sudo lspci -vvv indicates nvidia driver in use

UEFI graphic card set to discrete

I want to install cuDNN, but I prefer to wait until CUDA works properly

BTW:
$ nvcc --version

works properly

Best regards!

EDIT ONE DAY LATER:

Dear All
I have decided to reinstall my Ubuntu 20.04.
At the start I’ve checked nvidia-smi and is OK!
The very first thing that I have installed on ‘naked’ ubuntu is CUDA 11.2 and compatible cuDNN.
Here nvidia-smi behaves still well!
Just after that, another packages relying on CUDA&cuDNN were installed.
Everything works fine and I’m able to use accelerated computing :)

Conclusion: installation order is crucial for success. Updating old CUDA/ repairing broken CUDA can often cause corrupted dependencies, even if you purge everything(as I’ve done before).