I am experiencing issues with my NVIDIA driver installation on Ubuntu 22.04. Despite multiple attempts to resolve the issue, the driver fails to initialize my GPUs. Here’s what I have tried so far:
- Removed and purged all NVIDIA drivers using:
sudo apt purge '^nvidia-.*'
sudo apt autoremove --purge
sudo apt clean
- Reinstalled the driver using both the CUDA repository (version 12.8) and direct driver downloads from NVIDIA.
- Verified the presence of my GPUs using
lspci
, which lists them correctly.
lspci | grep -i nvidia
1b:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1)
1c:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1)
1d:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1)
1e:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1)
Here is the one way I tried installing drivers with:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.8.0/local_installers/cuda-repo-ubuntu2204-12-8-local_12.8.0-570.86.10-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-8-local_12.8.0-570.86.10-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-8-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-8
sudo reboot
ls
nvidia-smi
nvcc --version
sudo apt install nvidia-cuda-toolkit
nvcc --version
nvidia-smi
sudo apt-get install -y cuda-drivers
here is the output:
nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
I am unable to understand what is going wrong with my drivers installation:
- Checked kernel logs using:
sudo dmesg | grep -i nvidia
Key error messages include:
NVRM: This PCI I/O region assigned to your NVIDIA device is invalid.
nvidia: probe of [device_id] failed with error -1.
NVRM: None of the NVIDIA devices were initialized.
- Manually attempted to load NVIDIA kernel modules using
modprobe
, which did not resolve the issue.
I have uploaded the file nvidia-bug-report.log.gz with the ticket.
nvidia-bug-report.log.gz (11.8 MB)