GPU card is not detected in Azure RHEL 7.3 VM

I have created Azure VM with 2 GPU cards. Only one get detected.
In /var/log/messages file I can see this entry - nvidia 3130:00:00.0: can’t derive routing for PCI INT A

I have installed CUDA 9.2 kit.

Any help in fixing this issue ?

what is the output of the below commands?

lspci | grep NVIDIA
nvidia-smi

Here is the output of the above commands
[root@hazr022 sanfern]# lspci | grep -i nvidia
3130:00:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)
[root@hazr022 sanfern]#

lshw -C display command output:

*-display
description: 3D controller
product: GV100GL [Tesla V100 PCIe 16GB]
vendor: NVIDIA Corporation
physical id: 1
bus info: pci@3130:00:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress bus_master cap_list
configuration: driver=nvidia latency=0
resources: iomemory:100-ff iomemory:140-13f irq:0 memory:21000000-21ffffff memory:1000000000-13ffffffff memory:1400000000-1401ffffff

[root@hazr022 sanfern]# nvidia-smi
Fri Jul 13 17:55:35 2018
±----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111 Driver Version: 384.111 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE… Off | 00003130:00:00.0 Off | 0 |
| N/A 30C P0 36W / 250W | 0MiB / 16152MiB | 1% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+```

It appears to me that you are using NCv3 instance.
Can you confirm that you are using the Standard_NC12s_v3 instance? Because if you are using Standard_NC6s_v3 , it will have only one gpu.
Could you provide output of the below

nvidia-smi -L

what is the result of:

echo $CUDA_VISIBLE_DEVICES

What’s the output of

ls -l /dev/nvidia*

could you locate the deviceQuery and execute it and provide its output?

sudo apt-get install mlocate
sudo updatedb
locate deviceQuery

then go to the path if it got found and shown [e.g in my case it is /usr/local/cuda-9.2/samples/bin/x86_64/linux/release/deviceQuery]
and what I execute to navigate to the path is

cd /usr/local/cuda-9.2/samples/bin/x86_64/linux/release/

then navigate to that path and execute

./deviceQuery

You said you have installed 9.2 cuda kit. Did you use runfile or network method? Did you install driver or toolkit or both if you have used the runfile method? [it has separate sections for toolkit and video driver. Though it appears that you have installed the driver].
Normally when I had issues with Azure GPU VM’s I tried to check running other types of instances for determining the cause or engaged in conversation with their support until the issue got clarified.

Some references:
https://unix.stackexchange.com/questions/57562/second-gpu-does-not-show-up-in-lspci

if it would return 1 on

echo $CUDA_VISIBLE_DEVICES

, try

export CUDA_VISIBLE_DEVICE="0,1"

reference

I am using Standard_NC12s_v3. This issue resolved after reinstalling Linux Integration Services for Hyper-V and Azure.

thank you for the update.
Glad to know you have managed to resolve the issue.
In case you will have more GPU works in future , do not hesitate to let me know.
I am looking for PhD and I am gathering practical GPU experience that I would apply toward writing a thesis in the future[If I will have time to do so and will find a scientific adviser].

Thanks Andrey and good luck for your thesis. As per documentation nvidia driver 384.111 is supported on RHEL 7.3 and not on in RHEL7.5. This driver installs without error but nvidia-smi - No devices were found. Any plans to support 384.111 on RHEL 7.5

Reference found [url]https://devtalk.nvidia.com/default/topic/1031404/linux/cannot-compile-nvidia-driver-on-rhel-7-5-workstation/post/5247629/#5247629[/url]

Trying to build nvidia driver 384.111 on RHEL 7.5, I am getting following error.

[root@hazr022 nvidia-settings]# find / -name vdpau.h
[root@hazr022 nvidia-settings]# make
make[1]: Entering directory /root/sanfern/nvidia-settings/src' make[2]: Entering directory /root/sanfern/nvidia-settings/src/libXNVCtrl’
make[2]: Nothing to be done for default'. make[2]: Leaving directory /root/sanfern/nvidia-settings/src/libXNVCtrl’
CC gtk±2.x/ctkwindow.c
In file included from gtk±2.x/ctkwindow.c:64:0:
gtk±2.x/ctkvdpau.h:26:25: fatal error: vdpau/vdpau.h: No such file or directory
#include “vdpau/vdpau.h”
^
compilation terminated.
make[1]: *** [_out/Linux_x86_64/gtk2/ctkwindow.o] Error 1
make[1]: Leaving directory `/root/sanfern/nvidia-settings/src’
make: *** [all] Error 2

[root@hazr022 nvidia-settings]# yum list installed vdpau
Loaded plugins: product-id, search-disabled-repos, subscription-manager, versionlock
Installed Packages
libva-vdpau-driver.x86_64 0.7.4-19.el7
libvdpau.x86_64 1.1.1-3.el7
libvdpau-va-gl.x86_64 0.4.2-6.el7
vdpauinfo.x86_64 0.9-0.1.el7
[root@hazr022d73cf31 nvidia-settings]#

Any clue to fix this ?