GPU card is not detected in Azure RHEL 7.3 VM

santhosh.fernandes · July 12, 2018, 8:03pm

I have created Azure VM with 2 GPU cards. Only one get detected.
In /var/log/messages file I can see this entry - nvidia 3130:00:00.0: can’t derive routing for PCI INT A

I have installed CUDA 9.2 kit.

Any help in fixing this issue ?

Andrey1984 · July 21, 2018, 2:44pm

what is the output of the below commands?

lspci | grep NVIDIA

nvidia-smi

santhosh.fernandes · July 21, 2018, 6:00pm

Here is the output of the above commands
[root@hazr022 sanfern]# lspci | grep -i nvidia
3130:00:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)
[root@hazr022 sanfern]#

lshw -C display command output:

*-display
description: 3D controller
product: GV100GL [Tesla V100 PCIe 16GB]
vendor: NVIDIA Corporation
physical id: 1
bus info: pci@3130:00:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress bus_master cap_list
configuration: driver=nvidia latency=0
resources: iomemory:100-ff iomemory:140-13f irq:0 memory:21000000-21ffffff memory:1000000000-13ffffffff memory:1400000000-1401ffffff

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+```

Andrey1984 · July 22, 2018, 6:11am

It appears to me that you are using NCv3 instance.
Can you confirm that you are using the Standard_NC12s_v3 instance? Because if you are using Standard_NC6s_v3 , it will have only one gpu.
Could you provide output of the below

nvidia-smi -L

what is the result of:

echo $CUDA_VISIBLE_DEVICES

What’s the output of

ls -l /dev/nvidia*

could you locate the deviceQuery and execute it and provide its output?

sudo apt-get install mlocate
sudo updatedb
locate deviceQuery

then go to the path if it got found and shown [e.g in my case it is /usr/local/cuda-9.2/samples/bin/x86_64/linux/release/deviceQuery]
and what I execute to navigate to the path is

cd /usr/local/cuda-9.2/samples/bin/x86_64/linux/release/

then navigate to that path and execute

./deviceQuery

You said you have installed 9.2 cuda kit. Did you use runfile or network method? Did you install driver or toolkit or both if you have used the runfile method? [it has separate sections for toolkit and video driver. Though it appears that you have installed the driver].
Normally when I had issues with Azure GPU VM’s I tried to check running other types of instances for determining the cause or engaged in conversation with their support until the issue got clarified.

Some references:
https://unix.stackexchange.com/questions/57562/second-gpu-does-not-show-up-in-lspci

Andrey1984 · July 22, 2018, 11:30am

if it would return 1 on

echo $CUDA_VISIBLE_DEVICES

, try

export CUDA_VISIBLE_DEVICE="0,1"

reference

santhosh.fernandes · July 24, 2018, 10:24am

I am using Standard_NC12s_v3. This issue resolved after reinstalling Linux Integration Services for Hyper-V and Azure.

Andrey1984 · July 24, 2018, 10:28am

thank you for the update.
Glad to know you have managed to resolve the issue.
In case you will have more GPU works in future , do not hesitate to let me know.
I am looking for PhD and I am gathering practical GPU experience that I would apply toward writing a thesis in the future[If I will have time to do so and will find a scientific adviser].

santhosh.fernandes · July 26, 2018, 8:41am

Thanks Andrey and good luck for your thesis. As per documentation nvidia driver 384.111 is supported on RHEL 7.3 and not on in RHEL7.5. This driver installs without error but nvidia-smi - No devices were found. Any plans to support 384.111 on RHEL 7.5

Andrey1984 · July 26, 2018, 12:21pm

Reference found [url]https://devtalk.nvidia.com/default/topic/1031404/linux/cannot-compile-nvidia-driver-on-rhel-7-5-workstation/post/5247629/#5247629[/url]

santhosh.fernandes · July 27, 2018, 5:47pm

Trying to build nvidia driver 384.111 on RHEL 7.5, I am getting following error.

[root@hazr022 nvidia-settings]# find / -name vdpau.h
[root@hazr022 nvidia-settings]# make
make[1]: Entering directory /root/sanfern/nvidia-settings/src' make[2]: Entering directory /root/sanfern/nvidia-settings/src/libXNVCtrl’
make[2]: Nothing to be done for default'. make[2]: Leaving directory /root/sanfern/nvidia-settings/src/libXNVCtrl’
CC gtk±2.x/ctkwindow.c
In file included from gtk±2.x/ctkwindow.c:64:0:
gtk±2.x/ctkvdpau.h:26:25: fatal error: vdpau/vdpau.h: No such file or directory
#include “vdpau/vdpau.h”
^
compilation terminated.
make[1]: *** [_out/Linux_x86_64/gtk2/ctkwindow.o] Error 1
make[1]: Leaving directory `/root/sanfern/nvidia-settings/src’
make: *** [all] Error 2

[root@hazr022 nvidia-settings]# yum list installed vdpau
Loaded plugins: product-id, search-disabled-repos, subscription-manager, versionlock
Installed Packages
libva-vdpau-driver.x86_64 0.7.4-19.el7
libvdpau.x86_64 1.1.1-3.el7
libvdpau-va-gl.x86_64 0.4.2-6.el7
vdpauinfo.x86_64 0.9-0.1.el7
[root@hazr022d73cf31 nvidia-settings]#

Any clue to fix this ?

Topic		Replies	Views
Ubuntu 16.04+2 GTX1080 Ti: Nvidia-smi failed to detect all GPUs CUDA Setup and Installation	9	10282	February 5, 2018
GPU not detected Ubuntu Linux	35	95794	December 14, 2023
Couldn't communicate with the NVIDIA driver Linux	10	3772	April 10, 2023
Driver Nvidia not detected CUDA on Windows Subsystem for Linux driver , nvidia-smi	1	1582	August 11, 2021
Nvidia-smi recognize H100 when Firmware is disable Confidential Computing cuda , ubuntu	10	278	September 11, 2024
Cannot install NVIDIA driver on Ubuntu 22.04 with A100 Linux cuda , ubuntu , driver	2	4052	June 7, 2023
nvidia-smi "No devices were found" error CUDA Setup and Installation	23	62060	February 14, 2021
Need Help with P100 installation (R730 Dell) CUDA Setup and Installation	8	1666	August 18, 2023
Nvidia command cannot see second GPU CUDA Setup and Installation cuda , ubuntu , nvbugs	1	2032	August 30, 2022
2nd GPU not showing in nvidia-smi in Ubuntu 22.04 Linux	4	6040	June 2, 2024

GPU card is not detected in Azure RHEL 7.3 VM

Related topics