Installing NVIDIA Drivers, CUDA on Azure NVadsA10_v5 VM (Ubuntu 22.04)

Hello NVIDIA Community,

I am currently working with an Azure NVadsA10_v5 VM running Ubuntu 22.04 Linux, and I am encountering persistent issues while installing NVIDIA drivers, CUDA packages, and cuDNN to enable GPU capabilities.

Despite following all the recommended steps, including:

  • Disabling Secure Boot,
  • Ensuring kernel compatibility,
  • Reinstalling different NVIDIA driver versions (nvidia-driver-535, nvidia-driver-550, nvidia-driver-535-server, etc.),

I still face the same issue when I run the nvidia-smi command:

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

What I’ve Tried

  1. Installed the recommended driver versions for CUDA compatibility.
  2. Verified kernel versions and rebuilt DKMS modules.
  3. Disabled Secure Boot.
  4. Followed blogs and documentation for reinstalling NVIDIA drivers and CUDA packages.

Environment Details

  • VM Type: Azure NV6ads A10 v5 (6 vcpus, 55 GiB memory)
  • OS: Ubuntu 22.04
  • Kernel Version: 6.8.0-1020-azure
  • CUDA Version: Build cuda_11.5.r11.5/compiler.30672275_0
  • GPU Model: A10
  • Driver Versions Tried: 535, 535-server, 550, 525

Request for Support

Could someone guide me on:

  1. The correct and recommended procedure to install NVIDIA drivers, CUDA for this specific setup?
  2. How to resolve the nvidia-smi communication issue despite drivers being installed?
  3. Any specific configurations or settings needed for Azure NV6ads A10 v5 VMs?

I am having the same issue. Have tried everything and followed all the steps in the documentation. Yet when I do nvidia-smi it doesn’t work.

Were you able to find any solutions for this?

This is how I got it working for Standard_NV72ads_A10_v5 :

Installing the NVIDIA Driver on an Azure VM

Prerequisites

  • This guide is specifically for Azure VMs using GRID drivers for Azure.
  • The VM must be created in Standard mode to disable Trusted Launch.

1. Connect to Your VM

  • Use SSH to connect to your Azure VM.
ssh your-username@your-vm-ip-address

2. Update the Package List

  • Before installing new packages, update the package list.
sudo apt-get update

3. Install Necessary Packages

  • Install the required packages for building the NVIDIA driver.
sudo apt-get install -y build-essential

4. Blacklist Nouveau Drivers

  • Ensure that the Nouveau drivers are blacklisted to prevent conflicts.
echo "blacklist nouveau" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
echo "options nouveau modeset=0" | sudo tee -a /etc/modprobe.d/blacklist-nouveau.conf
sudo update-initramfs -u

5. Reboot the VM

  • Reboot your VM to apply changes.
sudo reboot

6. Download Driver File and Make the Driver File Executable

  • Download the GRID driver file for Azure and change its permissions to make it executable.
wget -P /tmp https://download.microsoft.com/download/8/d/a/8da4fb8e-3a9b-4e6a-bc9a-72ff64d7a13c/NVIDIA-Linux-x86_64-535.161.08-grid-azure.run
chmod +x /tmp/NVIDIA-Linux-x86_64-535.161.08-grid-azure.run

7. Run the NVIDIA Installer

  • Execute the installer script.
sudo /tmp/NVIDIA-Linux-x86_64-535.161.08-grid-azure.run --silent

8. Verify the Installation

  • After installation, verify that the driver is installed correctly.
nvidia-smi

9. Clean Up

  • Optionally, you can remove the installer file as it is no longer needed.
rm /tmp/NVIDIA-Linux-x86_64-535.161.08-grid-azure.run

Notes

Kernel Headers

  • If the installer complains about missing kernel headers, you may need to install them using:
sudo apt-get install linux-headers-$(uname -r)

No Display Manager

  • Since your VM is headless, there’s no need to stop any display manager.
1 Like

Hello, Simon! Your comment is one of the most recent and relevant to my issue. I’m stuck on step 7. When I try to run the NVIDIA installer, I get the following error:

ERROR: An error occurred while performing the step: ā€œBuilding kernel modulesā€. See /var/log/nvidia-installer.log for details.

ERROR: An error occurred while performing the step: ā€œChecking to see whether the NVIDIA kernel module was successfully builtā€. See /var/log/nvidia-installer.log for details.

ERROR: The NVIDIA kernel module was not created.

ERROR: Installation has failed. Please see the file ā€˜/var/log/nvidia-installer.log’ for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

Running this command didn’t help:

sudo apt-get install linux-headers-$(uname -r)

Could you share what might be causing this?

Hi Anna,

Are you using Linux with Kernel 6.11 (ubuntu 24.04 for example)?
If yes, there are some well known issues…

Since my last post, I started using the Azure Nvidia Gpu driver extension to install the Nvidia driver.
I provision the VM in standard mode (trusted launch disabled).
Then I install the extension using the driver version specified in Point 2..
If your VM is using Kernel 6.11, you’ll have to consider Point 1..

Hope it helps 😊

Thanks @simon.renuart . After a couple of days dealing with this problem, I was finally able to solve it. The underlying issue was the kernel, just as you pointed out.