Couldn't communicate with the NVIDIA driver

I started using azure nvidia-gpu-optimized-vmi-a10 vm

But there are no nvidia drivers in that VM and I am unable to install them also

Here are some commands and there outputs which can help in debugging the problem

Command : nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Command : sudo apt install nvidia-driver-495

Reading package lists... Done

Building dependency tree

Reading state information... Done

Package nvidia-driver-495 is not available, but is referred to by another package.

This may mean that the package is missing, has been obsoleted, or

is only available from another source

**E:** Package 'nvidia-driver-495' has no installation candidate

Command : sudo apt install nvidia-495

Reading package lists... Done
Building dependency tree       
Reading state information... Done
E: Unable to locate package nvidia-495

and for lspci command I am not getting any output

Some of the commands I tried
sudo add-apt-repository ppa:graphics-drivers/pp
sudo apt update

output for uname -a
Linux sanath 5.15.0-1020-azure #25~20.04.1-Ubuntu SMP Thu Sep 1 19:20:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

It’s a vGPU VM, meaning you can’t use the normal graphics driver but have to use the GRID driver instead. to install, use
https://learn.microsoft.com/en-us/azure/virtual-machines/extensions/hpccompute-gpu-linux
or download and install manually
https://learn.microsoft.com/en-us/azure/virtual-machines/linux/n-series-driver-setup
https://github.com/Azure/azhpc-extensions/blob/master/NvidiaGPU/resources.json

1 Like

I have added extension as there in the below link and I rebooted VM also still nothing changed @generix

Did you uninstall the repo driver beforehand?
Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.

you mean ppa:graphic-drivers/ppa repo ?

between attaching the file
nvidia-bug-report.log.gz (32.3 KB)

That looks like a very basic vm without any gpu.


This is the image I have choosed for this vm

Then something went awry on creating it, it’s a simple vm with two xeon cores, 4GB RAM, 8GB SSD and no supplemental devices. Please check sudo lspci to see.
Just delete it try to properly recreate it.

no output for sudo lspci command

I recreated VM
still I think nothing changed

here is the report
nvidia-bug-report.log.gz (32.2 KB)

More info : (This is message I get after opening vm terminal through ssh)

Welcome to the NVIDIA GPU Cloud image. This image provides an optimized

environment for running the deep learning and HPC containers from the

NVIDIA GPU Cloud Container Registry. Many NGC containers are freely

available. However, some NGC containers require that you log in with

a valid NGC API key in order to access them. This is indicated by a

“pull access denied for xyz …” or “Get xyz: unauthorized: …” error

message from the daemon.

Documentation on using this image and accessing the NVIDIA GPU Cloud

Container Registry can be found at

NVIDIA NGC - NVIDIA Docs

Last login: Sun Oct 2 17:05:18 2022 from 49.37.156.151

Warning: Unsupported instance type for NVIDIA GPU Cloud Machine Image.

Please use an ND, NCv2, or NCv3 instance for optimal performance and reliability.

please give suggestion @generix ?

You only reapplied the OS image, but didn’t change the azure instance as it seems. You need to create the correct VM instance first
https://learn.microsoft.com/en-us/azure/virtual-machines/sizes-gpu

I had the exact same problem, turns out it’s the wrong VM.

I used the Nvidia-GPU-Optimized-VM, and I also installed the Nvidia-GPU-Driver extension on the Azure portal, BUT I still got the same error.

Then I used ‘sudo lspci -v | less’ to check the GPU, turns out it’s still a fu*king AMD driver because I was using [NVv4-series] sizes.

You need to use [NCv3-series] and [NC T4_v3-series] to be able to use Nvidia GPU. Reference: (https://learn.microsoft.com/en-us/azure/virtual-machines/sizes-gpu?toc=%2Fazure%2Fvirtual-machines%2Flinux%2Ftoc.json)

Probably I am wrong because I am a new user of Azure, but Azure is so fu*king confusing.

1 Like