Ubuntu 18 - cuda-drivers-515 - “No devices were found” for Tesla V100

aalexsandr · May 17, 2022, 7:24am

Hey guys! After installation of Nvidia drivers 515 on Ubuntu 18.04 with V100 on board nvidia-smi stopped recognizing GPU.


#:~$ nvidia-smi

No devices were found

Drivers were not upgraded, but rather installed on a fresh Ubuntu 18 (on AWS) machine
Drivers were installed using Nvidia apt repository http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/
apt-get upgrade before installing drivers
With cuda-drivers-510 GPU is visible, but I can’t install CUDA through apt-get since cuda package has dependency on latest driver version.


#:~$ /usr/local/cuda/bin/nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2019 NVIDIA Corporation

Built on Wed_Oct_23_19:24:38_PDT_2019

Cuda compilation tools, release 10.2, V10.2.89


#:~$ inxi -G

Graphics: Card-1: Cirrus Logic GD 5446

Card-2: NVIDIA GV100GL [Tesla V100 SXM2 16GB]

Display Server: X.org 1.20.8 driver: nvidia tty size: 158x43 Advanced Data: N/A out of X


#:~$ cat /proc/driver/nvidia/version

NVRM version: NVIDIA UNIX x86_64 Kernel Module 515.43.04 Tue Apr 26 15:52:32 UTC 2022

GCC version: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)


#:~$ lsb_release -a

No LSB modules are available.

Distributor ID: Ubuntu

Description: Ubuntu 18.04.6 LTS

Release: 18.04

Codename: bionic

mrobbert · May 23, 2022, 8:49pm

I’m seeing the same problem with 515 drivers from the CentOS 7 repos as well. I can’t find any documentation about the 515 branch. It looks like 510 is supposed to be the latest production drivers, but somehow 515 got added to the repos on May 4.

mrobbert · May 25, 2022, 7:32pm

I just updated the drivers on one of my on-site GPU nodes and the 515 drivers do work here. My initial test was also in AWS so it looks like there may be a problem with AWS or th specific cards that they’re using. This is the card that I have in my on-site node that works:

1b:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)

This is the card in the p3 instance type in AWS:

00:1e.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1)

I’ll also note I’m seeing this error in dmesg when I try to run nvidia-smi on the AWS instance with the 515 driver.

[Wed May 25 19:19:20 2022] NVRM: GPU 0000:00:1e.0: RmInitAdapter failed! (0x25:0x17:1417)
[Wed May 25 19:19:20 2022] NVRM: GPU 0000:00:1e.0: rm_init_adapter failed, device minor number 0
[Wed May 25 19:19:21 2022] NVRM: GPU 0000:00:1e.0: RmInitAdapter failed! (0x25:0x17:1417)
[Wed May 25 19:19:21 2022] NVRM: GPU 0000:00:1e.0: rm_init_adapter failed, device minor number 0

Downgrading to the 510.73.08 driver in the same instance works fine.

mrobbert · June 6, 2022, 7:14pm

I think that I finally figured out what is wrong here. When Nvidia released the R515 drivers they create 2 different packages for the kernel modules. One package includes the new Open Source drivers that only support Turing, Ampere and later and another package that includes the proprietary drivers that they have been shipping for years that include all the architectures that they have been supporting in previous versions, including Volta. The problem is that if you try to install one of the packages that depend on the kernel modules the Open source version will get pulled in by default. The way around it is to explicitly install the kmod-nvidia-latest-dkms package if you need support for Volta or earlier GPUs. The package with the Open Source drivers that doesn’t work with these cards is kmod-nvidia-open-dkms.

It would be great if Nvidia could document this fact when referencing their package repositories.

kogiboo · September 12, 2022, 4:22am

I am getting the same problem on RHEL 7.9, v100, using the kmod-nvidia-latest-dkms package.
lspci knows I have a v100 but the driver does not know the PCI ID it lists is for a v100.
modprobe -vv nvidia … fail card not supported…
And this is a VM. So that maybe a problem too.
missing the nvidia-kmod-common*, this could be a problem.
This now looks like a firmware problem.
Maybe it is a GRID 9.1 issue. we need to update that anyway.

Topic		Replies	Views
Nvidia-smi: No devices were found Linux	1	5570	January 23, 2023
No cuda device found CUDA Setup and Installation	6	13857	June 30, 2017
Nvidia-smi: "no devices were found" on ubuntu 22.04 LTS, RTX 4080 Linux ubuntu	11	5368	November 6, 2023
SMI is not recognizing gpus after install drivers and cuda Linux	1	402	April 26, 2023
Nvidia-smi recognize H100 when Firmware is disable Confidential Computing cuda , ubuntu	10	330	September 11, 2024
No devices were found" when running the nvidia-smi Linux	16	14889	August 14, 2023
Error installing NVIDIA drivers on Ubuntu Linux kernel , ubuntu , driver	6	9025	February 15, 2024
Nvidia-smi shows no devices were found although driver is installed Drivers - Linux, Windows, MacOS	15	21973	July 6, 2024
[Ubuntu20.04] nvidia-smi detects no devices (using driver 530) Linux	5	3093	May 10, 2023
Installing cuda 9.2 on opensuse leap 15 CUDA Setup and Installation	3	5664	September 20, 2018

Ubuntu 18 - cuda-drivers-515 - “No devices were found” for Tesla V100

Related topics