nvidia-smi "No devices were found" Ubuntu 18.04

Ubuntu 18.04.02 LTS
GeForce GTX 1050 Ti Mobile
Kernel: 4.18.7-041807-generic
Driver: 418.56

I had my card and driver working, and then one time I booted my machine and I simply cannot get the card to work. I purged all NVIDIA packages from my machine and reinstalled, but to no avail.

I’ve attached the output of nvidia-bug-report.sh to this post.

Here are the symptoms I’m seeing:

$ sudo prime-select query
nvidia
$ sudo nvidia-smi
No devices were found

Note that I have disabled Nouveua, and installed the graphics drivers from here: https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa

Any help is appreciated.
nvidia-bug-report.log.gz (610 KB)

Please run

grep nvidia /etc/modprobe.d/* /lib/modprobe.d/*

to find a file containing

blacklist nvidia

and remove it,
then run

sudo update-initramfs -u

and reboot.

@generix - Thanks for the reply.

Here’s what I get when I run the command you mentioned:

$ sudo grep nvidia /etc/modprobe.d/* /lib/modprobe.d/*
/etc/modprobe.d/blacklist-framebuffer.conf:blacklist nvidiafb
/lib/modprobe.d/blacklist-nvidia.conf:# This file was generated by nvidia-prime
/lib/modprobe.d/blacklist-nvidia.conf:blacklist nvidia
/lib/modprobe.d/blacklist-nvidia.conf:blacklist nvidia-drm
/lib/modprobe.d/blacklist-nvidia.conf:blacklist nvidia-modeset
/lib/modprobe.d/blacklist-nvidia.conf:alias nvidia off
/lib/modprobe.d/blacklist-nvidia.conf:alias nvidia-drm off
/lib/modprobe.d/blacklist-nvidia.conf:alias nvidia-modeset off
/lib/modprobe.d/nvidia-kms.conf:# This file was generated by nvidia-prime
/lib/modprobe.d/nvidia-kms.conf:options nvidia-drm modeset=1

Should I remove the entire blacklist-nvidia.conf file? Why does nvidia-prime create this file?

Yes, please remove /lib/modprobe.d/blacklist-nvidia.conf completely but no other file
and run
sudo update-initramfs -u
afterwards.
This is some Ubuntu bug, that file is needed when prime-select is switched to intel, but during a driver update, this is often created errorneously and then forgotten.

After removing that file, running the update command, and rebooting, I’m unable to log-in to Ubuntu. As soon as I log in it hangs indefinitely. I fixed this by using a recovery mode shell to replace the file and re-run the update command.

I suspect this is because I set up a service to select the Intel GPU when the machine shuts down, as detailed in “Step 2” of the accepted answer here: https://askubuntu.com/questions/1057853/nvidia-card-for-cuda-and-intel-integrated-card-for-display-on-ubuntu-16-04-dell.

My objective is to use the Intel card for my displays, and use my NVIDIA card for machine learning with CUDA. What I had working before was to have the Intel card selected initially. Then, after log-in I would use prime-select to select NVIDIA. My displays would still be powered by the Intel card. Running the following would then activate my NVIDIA card:

$ sudo prime-select nvidia
$ sudo prime-switch

This worked for a couple days, but then seemingly randomly quit working. Note that I’m not installing CUDA directly on my machine, but rather using nvidia-docker2.

What can I do to work around these issues?

I’m now getting some bizarre but reproducible behavior. Some contextual information:

  • My machine is a Dell XPS-15 9570 laptop.
  • At home, I use a Thunderbolt 3 dock with two HDMI outputs to run a pair of monitors.
  • In my office, I have a USB 3 dock to run two monitors, for which I had to install a special driver from DisplayLink.
  • As the machine is a laptop, it of course has a built-in display.

Now for the weird behavior:

  • If I boot/login to Ubuntu with my Thunderbolt 3 dock plugged in, I can successfully select nvidia with prime-select and nvidia-smi finds the device and returns information (see bottom of post).
  • If I boot/login to Ubuntu without any dock plugged in, my sequence of using prime-select to switch to nvidia doesn’t work, and nvidia-smi gives the “No devices were found” message.
  • If I boot/login without any dock plugged in, and then subsequently plug in the Thunderbolt 3 dock, I get the “No devices were found” issue.
  • I’m not in the office right now, but I that’s where I was having problems yesterday, with and without my dock plugged in.
  • It’s worth noting that this will probably never work with my USB 3 dock. DisplayLink notes that “Closed source AMD/NVIDIA drivers are incompatible with DisplayLink driver. Please use open-source AMD/NVIDIA drivers instead.” Source: https://support.displaylink.com/knowledgebase/articles/641668-known-issues-with-displaylink-ubuntu-support

At this point, my primary question is: Why doesn’t this work when running my laptop’s built-in display alone? I can replace my USB 3 dock with a Thunderbolt 3 dock, but it would be nice to be able to use my GPU for computation when I’m not docked via Thunderbolt 3.

I’ve attached a fresh output from nvidia-bug-report.sh in case it contains useful information related to my sequence of docked/undocked tests I described above.

Output of nvidia-smi when it’s working:

Mon Apr 29 08:19:15 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56       Driver Version: 418.56       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   45C    P0    N/A /  N/A |      4MiB /  4042MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

nvidia-bug-report.log.gz (1.14 MB)

A clean setup to use your iGPU for display and the dGPU for compute only:
https://devtalk.nvidia.com/default/topic/1043405/linux/ubuntu-18-04-headless_390-intel-igpu-after-prime-select-intel-lost-contact-to-geforce-1050ti/post/5293003/#5293003
Furthermore, you’re sometimes running into:

Apr 29 08:17:13 brandon-XPS-15-9570 kernel: NVRM: RmInitAdapter failed! (0x26:0xffff:1106)
Apr 29 08:17:13 brandon-XPS-15-9570 kernel: NVRM: rm_init_adapter failed for device bearing minor number 0

This looks like an acpi bug, like:
https://devtalk.nvidia.com/default/topic/1050398/linux/418-56-gtx-1050-ti-mobile-dell-xps-4-19-34-1-lts-rminitadapter-failed-/post/5331543/#5331543
Please check for a bios update.

generix,

Thanks for your prompt assistance on this. I’ll look into your solutions soon (I’m out of time for this problem today), and will accept your answer if it does the trick for me and post updates.

Side note: My BIOS is only one revision behind (the most recent version came out a few days ago). I’ve pasted the release notes from the next version which I haven’t installed, and I doubt it’s going to help with this issue (though I’m no expert, so please correct me if I’m wrong).

Dell XPS-15 9570, BIOS 1.9.1
Fixes:
- Fixed an issue with Secure Boot Option ROM Signature Verification.
- Firmware updates to address security advisories INTEL-SA-00191(CVE-2018-12201, CVE-2018- 12202, CVE-2018-12203, CVE-2018-12205)
- Firmware updates to address security advisory INTEL-SA-00185 (CVE-2018-12188 CVE-2018-12190 CVE-2018-12191 CVE-2018-12192 CVE-2018-12199 CVE-2018-12198 CVE-2018-12200 CVE-2018-12187 CVE-2018-12196 CVE-2018-12185)
- Fixed the issue where the system cannot set hard drive password with Dell Client Configuration Toolkit.
- Fixed the issue where the mouse lags when the Dell TB16 dock is unplugged or plugged in.
- Fixed the issue where incorrect logon message appears in Windows login screen when Secure sign-in option is enabled in Windows.
- Fixed the issue where the BIOS boot mode cannot be set as Legacy mode.

Enhancements:
- Added BIOS Setup configuration of Intel(R) Active Management Technology.
- Removed the option Always Allow Dell Docks from BIOS settings. Dell dock connection and the port behavior will be controlled via the USB and Thunderbolt Adapter configuration settings under operating system environment.
- Added the option Always, Except Internal HDD&PXE in UEFI Boot Path Security feature. This option is added to skip the admin password prompt while booting to internal hard disk drive or Pre-boot Execution Environment (PXE).