can't install new driver, cannot unload module

it is said that ‘An NVIDIA kernel module ‘nvidia-uvm’ appears to already be loaded in your kernel.’
i tried to unload but ‘operation is not permitted’ (in su mode)

the lsmod | grep nvidia output is

nvidia_drm 45056 0
drm_kms_helper 151552 1 nvidia_drm
drm 352256 3 nvidia_drm,drm_kms_helper
nvidia_modeset 790528 1 nvidia_drm
nvidia_uvm 647168 0
nvidia 12304384 2 nvidia_modeset,nvidia_uvm

im am using centos 7, running on virtual machine. i cant restart server. since every one use the same server

the problem started when i accidentally install new nvidia driver while installing cuda.
i cant do nvidia-smi

i already uninstall cuda and nvidia.

but i cannot install new driver

what should i do

Stop any Xserver, stop the nvidia-persistenced, then unload modules using sudo modprobe -r nvidia

You may also want to try update-initramfs to make sure nothing is getting added at boot from that.

how can i list any of X server that i used?

i try to
‘service kdm stop’
‘service gdm stop’
‘service lightdm stop’

but i always get
‘failed to stop, service is not loaded’

i access this server using jupyterhub installed on server through my browser and used terminal that jupyterhub provided. so if i stop Xserver should i ssh to server or i can still use jupyterhub?

sorry i’m new to this OS :)

use
ps a |grep X
to check for running Xservers.

The output of ‘ps axu |grep X’ is
root 756 0.0 0.0 9036 892 pts/3 S+ 08:33 0:00 grep --color=auto X

What xserver di i use?

none.
So there’s something different keeping the modules from being unloaded. Please run nvidia-bug-report.sh as root and attach the resulting .gz file to your post. Hovering the mouse over an existing post will reveal a paperclip icon.

i already uninstall nvidia and cannot install nvidia, can i download the file separately and run it on my machine?

what about if i blacklist the nvidia module and install the new driver then whitelist the module again?

so in the end i fresh install my VM with ubuntu (i used centos before). but when i run
lsmod | grep nvidia

nvidia_uvm nvdia_drm and nvidia still loaded

why would they load? even im using fresh installed os
please help me im stuck with this

my server condition:

  • no one is using GPU except me
  • there are several VM in the server so i cannnot restart server

should i unload the module from the main computer or restarting server is not avoidable?
or there is another solution?

i have same problem with him… and i still struggle with it …

Have you updated the initramfs?

It’s possible that module unloading has not been compiled into your kernel. Do you have the config file in /proc? It would be /proc/config.gz - check it to see if it has module unloading selected…

I solved this problem by disabling the GUI, rebooting, logging in and installing the driver, enabling GUI, and reboot.

Please make sure you know your username and password!!!

Open a terminal and write

sudo systemctl set-default multi-user.target
sudo reboot 0

Now login and you’ll get to a terminal directly, install the driver Do note that I am installing here the 440.44 so you need to modify for your driver version.

sudo ./NVIDIA-Linux-x86_64-440.44.run

After installing the driver enable the GUI and Reboot:

sudo systemctl set-default graphical.target
sudo reboot 0

You should be done

In my case, nvidia-smi reported the new version 440.44, whine in the Ubuntu 18.04 Software & Updates Utilities, Additional Drivers Tab shows 435!! Another NVIDIA mystery, but heck my new docker works!!!

1 Like

Hi, could you take a look on a bit of my logs as well please:

The NVIDIA probe routine failed for 1 device(s).
Jun 13 15:18:12 maxx kernel: [ 2318.716005] NVRM: None of the NVIDIA devices were initialized.
Jun 13 15:18:12 maxx kernel: [ 2318.716203] nvidia-nvlink: Unregistered the Nvlink Core, major device number 234
Jun 13 15:18:13 maxx kernel: [ 2319.082988] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
Jun 13 15:18:13 maxx kernel: [ 2319.083639] NVRM: This is a 64-bit BAR mapped above 4GB by the system
Jun 13 15:18:13 maxx kernel: [ 2319.083639] NVRM: BIOS or the Linux kernel, but the PCI bridge
Jun 13 15:18:13 maxx kernel: [ 2319.083639] NVRM: immediately upstream of this GPU does not define
Jun 13 15:18:13 maxx kernel: [ 2319.083639] NVRM: a matching prefetchable memory window.
Jun 13 15:18:13 maxx kernel: [ 2319.083640] NVRM: This may be due to a known Linux kernel bug. Please
Jun 13 15:18:13 maxx kernel: [ 2319.083640] NVRM: see the README section on 64-bit BARs for additional
Jun 13 15:18:13 maxx kernel: [ 2319.083640] NVRM: information.