After installing CUDA 9.0 in POWER9(RHEL7), nvidia-smi shows Unknown Error in Memory_Usage column.

ankitpurohit · April 17, 2018, 2:09am

I have installed CUDA 9.0 on RHEL based POWER9 and after installation I nvidia-smi showing following error.
What is this error and how to resolve this?

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±---------------------------------------------------

Robert_Crovella · April 17, 2018, 2:15am

I suspect you didn’t follow the mandatory additional setup steps which are unique to Power9 CUDA 9/9.1 setup:

[url]https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#power9-setup[/url]

ankitpurohit · April 17, 2018, 2:34am

Thank you so much txbob!!
Actually, I have followed these step when installing CUDA. But unfortunately, I forgot to comment out one rule and that’s why I get this error.
I checked my configuration file again and fixed it. Now, nvidia-smi is working fine :)

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

marswhc · April 21, 2018, 4:41pm

I encountered the same problem on ubuntu 16.04. I also followed Power9 additional setup steps to make ‘/lib/udev/rules.d/40-vm-hotadd.rules’, but it did not work. The memory_usage column still showing ‘unknown error’

Robert_Crovella · April 21, 2018, 4:53pm

There are 2 changes that need to be made. That is one of them. There is another (read the linked section).

Then you need to reboot.

marswhc · April 21, 2018, 5:02pm

Thanks for the prompt reply. the nvidia-persistenced service has been enabled and running

#systemctl status nvidia-persistenced
● nvidia-persistenced.service - NVIDIA Persistence Daemon
Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; static; vendor preset: enabled)
Active: active (running) since Sun 2018-04-22 00:58:37 CST; 1min 35s ago
Process: 1878 ExecStart=/usr/bin/nvidia-persistenced --user nvidia-persistenced --no-persistence-m
Main PID: 1890 (nvidia-persiste)
CGroup: /system.slice/nvidia-persistenced.service
└─1890 /usr/bin/nvidia-persistenced --user nvidia-persistenced --no-persistence-mode --ve

Apr 22 00:58:37 Ubuntu systemd[1]: Starting NVIDIA Persistence Daemon…
Apr 22 00:58:37 Ubuntu nvidia-persistenced[1890]: Verbose syslog connection opened
Apr 22 00:58:37 Ubuntu nvidia-persistenced[1890]: Now running with user ID 110 and group ID 118
Apr 22 00:58:37 Ubuntu nvidia-persistenced[1890]: Started (1890)
Apr 22 00:58:37 Ubuntu systemd[1]: Started NVIDIA Persistence Daemon.
Apr 22 00:58:37 Ubuntu nvidia-persistenced[1890]: device 0004:04:00.0 - registered
Apr 22 00:58:37 Ubuntu nvidia-persistenced[1890]: device 0004:05:00.0 - registered
Apr 22 00:58:37 Ubuntu nvidia-persistenced[1890]: device 0035:03:00.0 - registered
Apr 22 00:58:37 Ubuntu nvidia-persistenced[1890]: device 0035:04:00.0 - registered
Apr 22 00:58:37 Ubuntu nvidia-persistenced[1890]: Local RPC service initialized

Robert_Crovella · April 21, 2018, 5:12pm

what is the contents of your /lib/udev/rules.d/40-vm-hotadd.rules file?

marswhc · April 21, 2018, 5:17pm

cat /lib/udev/rules.d/40-vm-hotadd.rules

On Hyper-V and Xen Virtual Machines we want to add memory and cpus as soon as they appear

ATTR{[dmi/id]sys_vendor}==“Microsoft Corporation”, ATTR{[dmi/id]product_name}==“Virtual Machine”, GOTO=“vm_hotadd_apply”
ATTR{[dmi/id]sys_vendor}==“Xen”, GOTO=“vm_hotadd_apply”
GOTO=“vm_hotadd_end”

LABEL=“vm_hotadd_apply”

Memory hotadd request

#SUBSYSTEM==“memory”, ACTION==“add”, DEVPATH==“/devices/system/memory/memory[0-9]*”, TEST==“state”, ATTR{state}=“online”

CPU hotadd request

SUBSYSTEM==“cpu”, ACTION==“add”, DEVPATH==“/devices/system/cpu/cpu[0-9]*”, TEST==“online”, ATTR{online}=“1”

LABEL=“vm_hotadd_end”

uname -a

Linux Ubuntu 4.13.0-38-generic #43~16.04.1-Ubuntu SMP Wed Mar 14 17:46:55 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux

Robert_Crovella · April 21, 2018, 5:31pm

what is the output of:

grep ‘SUBSYSTEM==“memory”’ /lib/udev/rules.d/*

also what is the output of:

dmesg |grep NVRM

marswhc · April 21, 2018, 5:42pm

grep ‘SUBSYSTEM==“memory”’ /lib/udev/rules.d/*

/lib/udev/rules.d/40-vm-hotadd.rules:#SUBSYSTEM==“memory”, ACTION==“add”, DEVPATH==“/devices/system/memory/memory[0-9]*”, TEST==“state”, ATTR{state}=“online”

dmesg |grep NVRM

[ 2.853579] NVRM: loading NVIDIA UNIX ppc64le Kernel Module 390.31 Fri Feb 2 00:22:17 PST 2018 (using threaded interrupts)

Robert_Crovella · April 21, 2018, 5:49pm

have you rebooted?

is there any difference if you run the nvidia-smi command with sudo?

Anyway I’m pretty much out of ideas

marswhc · April 22, 2018, 12:56am

Yes, I have rebooted.

Andrey1984 · April 22, 2018, 2:26pm

@marswhc:what is the output of

cat /usr/lib/systemd/system/nvidia-persistenced.service

Did you export CUDA PATH [step 7.1.1.]?

marswhc · April 22, 2018, 2:52pm

hi Andrey1984,

cat /usr/lib/systemd/system/nvidia-persistenced.service

cat: /usr/lib/systemd/system/nvidia-persistenced.service: No such file or directory

echo $PATH

/usr/local/cuda-9.1/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

Andrey1984 · April 22, 2018, 3:02pm

I did not deal with the issue, but the instruction said:
[i]
Create and enable a systemd service file or init script that runs the NVIDIA Persistence Daemon as
the first NVIDIA software during or at the end of the boot process. The following service file example is sufficient for most installations:
[Unit]
Description=NVIDIA Persistence Daemon
Wants=syslog.target

[Service]
Type=forking
PIDFile=/var/run/nvidia-persistenced/nvidia-persistenced.pid
Restart=always
ExecStart=/usr/bin/nvidia-persistenced --verbose
ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced

[Install]
WantedBy=multi-user.target
Copy the above text into the following file:
/usr/lib/systemd/system/nvidia-persistenced.service
And run the following command:
$ sudo systemctl enable nvidia-persistenced[/i]
However since you have the service running from “/lib/systemd/system/nvidia-persistenced.service”
that shouldn’t be the issue.
Could you

cat /lib/systemd/system/nvidia-persistenced.service

?

marswhc · April 22, 2018, 3:12pm

cat /lib/systemd/system/nvidia-persistenced.service

[Unit]
Description=NVIDIA Persistence Daemon
Wants=syslog.target

[Service]
Type=forking
ExecStart=/usr/bin/nvidia-persistenced --user nvidia-persistenced --no-persistence-mode --verbose
ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced

marswhc · April 23, 2018, 2:28pm

Now the ‘unknown message’ is gone after changed ubuntu kernel to 4.10. Thanks for you guys help!

eriverac · May 2, 2018, 8:40pm

I had the same problem in RedHat, I modify the file /etc/udev/rules.d/40-redhat.rules
is necesary comment this line

SUBSYSTEM==“memory”, ACTION==“add”, DEVPATH==“/devices/system/memory/memory[0-9]*”, TEST==“state”, ATTR{state}=“online”

reboot and test

kennric · June 8, 2018, 10:31pm

I am having this issue on Ubuntu 16.04, we have kernel 4.13.0-36-generic #40~16.04.1-Ubuntu SMP

marswhc - did you downgrade from kernel 4.13?

Topic		Replies	Views
Power9 - nvidia-smi shows "unknown error" in memory column Linux	35	10321	October 14, 2021
Nvidia driver installation on Power9 machine - Nvidia smi memory 'Unknown Error' Linux	3	624	May 19, 2019
Power-9 (ppc64le) - Cuda9.2 - Nvidia driver failures Linux	5	692	December 24, 2018
1050 Ti laptop CUDA 9.2 nvidia-persistenced problems CUDA Setup and Installation	3	9007	May 18, 2018
nvidia-smi "No devices were found" error CUDA Setup and Installation	23	62851	February 14, 2021
CUDA 9.1 on Ubuntu 16.04 ppc64le - all CUDA examples crash: "remap_4k_pfn called with wrong pfn value" CUDA Programming and Performance	3	578	May 9, 2018
Failed call to cuInit CUDA_ERROR_NOT_INITIALIZED (Device mapping: no known devices) CUDA Setup and Installation	7	6520	November 27, 2018
"NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver" on Ubuntu 17.10 CUDA Setup and Installation	10	12736	June 4, 2018
Cannot nvidia-smi Geforce 1070 anymore suddenly. Linux	9	1671	October 12, 2021
"NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver" Ubuntu 16.04 CUDA Setup and Installation	79	371838	March 19, 2021

After installing CUDA 9.0 in POWER9(RHEL7), nvidia-smi shows Unknown Error in Memory_Usage column.

cat /lib/udev/rules.d/40-vm-hotadd.rules

On Hyper-V and Xen Virtual Machines we want to add memory and cpus as soon as they appear

Memory hotadd request

CPU hotadd request

uname -a

grep ‘SUBSYSTEM==“memory”’ /lib/udev/rules.d/*

dmesg |grep NVRM

cat /usr/lib/systemd/system/nvidia-persistenced.service

echo $PATH

cat /lib/systemd/system/nvidia-persistenced.service

SUBSYSTEM==“memory”, ACTION==“add”, DEVPATH==“/devices/system/memory/memory[0-9]*”, TEST==“state”, ATTR{state}=“online”

Related topics