Azure VM - nvidia-smi says: No devices were found

Hi,

I’ve some weeks ago successfully used a Ubuntu 18.04 installation of a Azure environment, however when I try it now I cannot make this work.

Some more details:
I instantiate an image (which has been working perfectly earlier) as a NC24 instance. I can do lspci, and I can see the 4 GPUs (K80’s).

oystein@LinuxGPU4:~$ lspci
0000:00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (AGP disabled) (rev 03)
0000:00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 01)
0000:00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
0000:00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
0000:00:08.0 VGA compatible controller: Microsoft Corporation Hyper-V virtual VGA
68ad84e5:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
68adab42:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
68adcc90:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
68adeea2:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)

However when I try any GPU-code like tensorflow or pytorch, it fails with a message of device not found. So does nvidia-smi.

oystein@LinuxGPU4:~$ nvidia-smi
No devices were found

The kernel says:

oystein@LinuxGPU4:~$ dmesg | grep NVRM
[ 30.718715] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 390.77 Tue Jul 10 18:28:52 PDT 2018 (using threaded interrupts)
[ 31.364173] NVRM: RmInitAdapter failed! (0x23:0x51:470)
[ 31.364977] NVRM: rm_init_adapter failed for device bearing minor number 0
[ 31.372098] NVRM: RmInitAdapter failed! (0x23:0x51:470)
[ 31.373187] NVRM: rm_init_adapter failed for device bearing minor number 1
[ 31.379099] NVRM: RmInitAdapter failed! (0x23:0x51:470)
[ 31.379756] NVRM: rm_init_adapter failed for device bearing minor number 2
[ 31.391490] NVRM: RmInitAdapter failed! (0x23:0x51:470)
[ 31.392267] NVRM: rm_init_adapter failed for device bearing minor number 3
[ 84.736962] NVRM: RmInitAdapter failed! (0x23:0x51:470)
[ 84.737636] NVRM: rm_init_adapter failed for device bearing minor number 0
[ 84.745368] NVRM: RmInitAdapter failed! (0x23:0x51:470)
[ 84.746035] NVRM: rm_init_adapter failed for device bearing minor number 1
[ 84.751980] NVRM: RmInitAdapter failed! (0x23:0x51:470)
[ 84.752681] NVRM: rm_init_adapter failed for device bearing minor number 2
[ 84.758466] NVRM: RmInitAdapter failed! (0x23:0x51:470)
[ 84.759117] NVRM: rm_init_adapter failed for device bearing minor number 3
[ 694.989213] NVRM: RmInitAdapter failed! (0x23:0x51:470)
[ 694.989911] NVRM: rm_init_adapter failed for device bearing minor number 0
[ 694.996108] NVRM: RmInitAdapter failed! (0x23:0x51:470)
[ 694.996785] NVRM: rm_init_adapter failed for device bearing minor number 1
[ 695.002668] NVRM: RmInitAdapter failed! (0x23:0x51:470)
[ 695.003335] NVRM: rm_init_adapter failed for device bearing minor number 2
[ 695.009176] NVRM: RmInitAdapter failed! (0x23:0x51:470)
[ 695.009838] NVRM: rm_init_adapter failed for device bearing minor number 3

I see other posters here has as similar problems, however I cannot find any solution. I’ve tried reinstalling (apt remove and then apt install) of all nvidia packages, and I’ve tried rebooting the machine several times.

Thanks!

Same here!

I stopped my Azure VM Ubuntu yesterday, and when I restarted it this morning, nvidia-smi returned “no devices found”

I used the machine last friday and it was working fine.

I also tried to reinstall nvidia drivers, nothing happens.

Have you managed to solve this issue?

Cheers,

Afonso

I’m sorry, Afonso. I’ve not solved this. But I wonder if this problem is in Azure actually, rather than nvidia. Or maybe Ubuntu?

Had the same erroe, looks like uninstalling all nvidia packages, installing nvidia-driver-396 @ 396.54-0ubuntu0~gpu18.04.1 did the trick for me. Just reloaded the nvidia module and CUDA works again.