nvidia-smi stops working after reboot ununtu 18.04

I have Ubuntu 18.04 installed on ASUS laptop with GEFORCE 940MX GPU card. I have tried everything using proprietary drivers or using NVIDIA run file to install cuda drivers. Finally, I was able to install NVIDIA and CUDA drivers using cuda run file.

OpenGL and NVIDIA-X-config were not installed during this installation.
Also, secure boot is disabled prior to installation.

Now, nvidia-smi works after this installation, but whenever I reboot system it gives error: “NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.”

It will be really helpful if experts can comment how to overcome this issue. Thanks in advance

This is probably an Optimus laptop, so you shouldn’t use the .run installer. Use the driver from ubuntu graphics ppa, then download the .deb and install cuda-toolkit.

Is there a better answer to this?

I had to use the run-installer to prevent the installation of “drm” (direct rendering machine?) which apparantly allows the intel onboard graphics card to be used by the graphical user interface (gdm? lightdm? xorg?).
The GPU memry should be untouched by the GUI since the NVIDIA GPU is used for deep-learning and is needed in it’s full capacity.

Therefore I had to

sudo ./NVIDIA-Linux-x86_64-430.40.run --no-open-gl-files --no-drm

There most likely is.
But since I know nothing about neither your hardware nor your distro, I’ll have to use my psychic powers again and say this:
https://devtalk.nvidia.com/default/topic/1043405/linux/ubuntu-18-04-headless_390-intel-igpu-after-prime-select-intel-lost-contact-to-geforce-1050ti/post/5293003/#5293003

Looks good,now I just need to find out how to write such a /etc/X11/xorg.conf for my case with two monitors.

(same config as above mentioned: 18.04, Intel onboard for display, Nvidia RTX 2080 for Cuda)

It’s 2019, you don’t need to specify any monitors in xorg.conf since 2007 anymore, except for very specific cases. Just setting the device to use should be enough which that snippet does.
Just connect the monitor and use Gnome’s control center to arrange them. Unless you have set ‘nomodeset’ and killed the intel gpu.

hmm, the snippet is working (can login to the GUI after rebooting), but the nvidia-smi is still not working (same error as before: “NVIDIA-SMI has failed because it couldn’t communicate …” ).

Please run nvidia-bug-report.sh as root and attach the resulting .gz file to your post. Hovering the mouse over an existing post of yours will reveal a paperclip icon.
https://devtalk.nvidia.com/default/topic/1043347/announcements/attaching-files-to-forum-topics-posts/

I did add these lines to my /etc/modprobe.d/blacklist.conf before installing via nvidia-runfile:

blacklist nouveau
blacklist lbm-nouveau
alias nouveau off
alias lbm-nouveau off
options nouveau modeset=0

is that a problem?
nvidia-bug-report.log.gz (147 KB)

There’s no kernel driver installed.
Did you already uninstall the .run installer driver using the --uninstall option?
Afterwards, you should install the nvidia driver from repo, either software&drivers application or
sudo apt install nvidia-driver-430
Only if that has been done the post I linked to works, of course.

As I mentioned before, installing the driver via sudo apt install nvidia-driver-430 leads to the fact that the nvidia GPU is used for the GUI / x-server.

I can’t have that.

My configuration (before rebooting) was:

  • Intel onboard graphics for GUI
  • Nvidia RTX 2080 for CUDA computations only

Therefore I had to install it via the runfile like this:

sudo ./NVIDIA-Linux-x86_64-430.40.run --no-open-gl-files --no-drm

where the options --no-open-gl-files and --no-drm prevent the system from allocating memory on the Nvidia GPU for the X-server (a problem I had before)

Is there a way to install the ppa drivers with the above mentioned options? (no-open-gl and no-drm)?

As far as I understand it drm (direct rendering manager?) is especially guilty of this behaviour, to borrow memory from the Nvidia GPU for the x-server, so it needs to be suppressed.

Are you kidding me?

I’m not, I’ve done the driver installation before - with the described (undesired) effect.

However I just found a solution here: https://gist.github.com/wangruohui/bc7b9f424e3d5deb0c0b8bba990b1bc5

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt install nvidia-headless-435 nvidia-utils-435

"IMPORTANT:

The nvidia-headless-435 contains only the driver, while the full nvidia-driver-435 package contain everything including display component such OpenGL libraries. If you hope to connect the display to a NVIDIA display card, install the full package, otherwise, install only the driver.

The nvidia-utils-435 package provide utilities such as nvidia-smi.

Reboot. If the installation is successful, command nvidia-smi will show all NVIDIA GPUs."

Still, thanks for your quick replies.