NVIDIA Driver Fails to Initialize GPUs on Ubuntu 22.04: "NVRM: This PCI I/O region assigned to your NVIDIA device is invalid"

I am experiencing issues with my NVIDIA driver installation on Ubuntu 22.04. Despite multiple attempts to resolve the issue, the driver fails to initialize my GPUs. Here’s what I have tried so far:

  1. Removed and purged all NVIDIA drivers using:
sudo apt purge '^nvidia-.*'
sudo apt autoremove --purge
sudo apt clean
  • Reinstalled the driver using both the CUDA repository (version 12.8) and direct driver downloads from NVIDIA.
  • Verified the presence of my GPUs using lspci, which lists them correctly.
    lspci | grep -i nvidia
    1b:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1)
    1c:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1)
    1d:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1)
    1e:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1)

Here is the one way I tried installing drivers with:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.8.0/local_installers/cuda-repo-ubuntu2204-12-8-local_12.8.0-570.86.10-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-8-local_12.8.0-570.86.10-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-8-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-8
sudo reboot
nvcc --version
sudo apt install nvidia-cuda-toolkit
nvcc --version
sudo apt-get install -y cuda-drivers

here is the output:

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

I am unable to understand what is going wrong with my drivers installation:

  • Checked kernel logs using:
sudo dmesg | grep -i nvidia

Key error messages include:

  • NVRM: This PCI I/O region assigned to your NVIDIA device is invalid.
  • nvidia: probe of [device_id] failed with error -1.
  • NVRM: None of the NVIDIA devices were initialized.
  • Manually attempted to load NVIDIA kernel modules using modprobe, which did not resolve the issue.
    I have uploaded the file nvidia-bug-report.log.gz with the ticket.
    nvidia-bug-report.log.gz (11.8 MB)

Please read this post: »»»»»»»»»» If you have a problem, PLEASE read this first ««««««««««


Thanks for the guidance. i have uploaded the required file

I don’t think I can be of much help here, as I am not a server expert.

First of all, here is a thread that discusses the PCI I/O region assignment issue:

Then I can see you somehow installed driver version 570.86.10 but still have residual files from 550.144.03.
Try purging the driver again and REBOOT!

Then install again following the CUDA installation instructions to the letter. Especially the nvidia GPU driver isntallation might differ from the above.

If nothing of the above helps, I recommend talking to your server service provider or post in the dedicated Linux forum category.
