nvidia-smi sees only one GPU on an Azure NC24 instance

Hi!

I’m running a NC24 instance in Azure. It running my favorite Linux distro Arch Linux, however I can only get one GPU working, and the NC24 is specified with 4 Tesla K80s. Here are some details:

My nvidia-smi output:

[oystein@archlinux ~]$ nvidia-smi
Thu Aug 30 08:19:49 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.54                 Driver Version: 396.54                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00003130:00:00.0 Off |                    0 |
| N/A   62C    P0   139W / 149W |  10970MiB / 11441MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     75010      C   /usr/bin/python3                           10957MiB |
+-----------------------------------------------------------------------------+

My system kernel:

[oystein@archlinux ~]$ uname -a
Linux archlinux 4.18.5-arch1-1-ARCH #1 SMP PREEMPT Fri Aug 24 12:48:58 UTC 2018 x86_64 GNU/Linux

My kernel messages:

[oystein@archlinux ~]$ dmesg | grep NVRM
[   21.055720] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  396.54  Tue Aug 14 19:02:34 PDT 2018 (using threaded interrupts)

Thanks for any help!

maybe you’re actually on a NC6 instance instead of NC24

Interesting comment. It was a plausible theory, but unfortunately wrong…

[oystein@archlinux ~]$ cat /proc/cpuinfo | grep processor |tail -3
processor       : 21
processor       : 22
processor       : 23

Do I need to install cuda and driver in a special order?

-Øystein

what is the result of running

lspci |grep -i 3D

?

[oystein@archlinux ~]$ lspci | grep -i 3d
3130:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)

so that is why

If you were expecting 4 of those, there should have been 4 lines in lspci output, one for each of the 4 K80 devices.

If that is a NC24 instance, that’s a broken instance, not a NVIDIA issue.

OK! Thanks for your help, txbob. I’ll see what I can do on the Azure side then. Worst case I’ll create a new instance.

Thanks,