UVM GPU1 BH process causing 100% CPU after standby

max.wittal · February 11, 2018, 10:04pm

Hi,

I’ve noticed a consistent problem where after the system comes back from a standby this GPU process (which I guess is related to unified memory?) will just keep burning my CPU:

Tasks: 290 total,   2 running, 288 sleeping,   0 stopped,   0 zombie
%Cpu(s): 13.6 us,  8.8 sy,  0.0 ni, 77.4 id,  0.1 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem : 32648460 total,  1204044 free,  6101424 used, 25342992 buff/cache
KiB Swap: 33259004 total, 33259004 free,        0 used. 25873308 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND         
31363 root      20   0       0      0      0 R 100.0  0.0  35:10.91 UVM GPU1 BH     
    1 root      20   0  185340   5464   3480 S   0.0  0.0   0:02.06 systemd         
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.03 kthreadd

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 387.34                 Driver Version: 387.34                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P4000        Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   44C    P8     5W /  N/A |   1180MiB /  8114MiB |     22%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1235      G   /usr/lib/xorg/Xorg                           778MiB |
|    0      2250      G   compiz                                       129MiB |
|    0     17896      G   /proc/self/exe                               259MiB |
+-----------------------------------------------------------------------------+

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

When trying to run my application after a standby I also get the error:

cudaMallocManaged(): all CUDA-capable devices are busy or unavailable

Only a restart helps, which is very annoying.

How do I attach the “nvidia-bug-report.log.gz”?

Thank you.

Max

Robert_Crovella · February 11, 2018, 10:28pm

If you’re trying to report a driver issue, the best place is at http://developer.nvidia.com in the bug reporting area. You will need to be signed up as a registered developer.

For general discussion around Linux driver issues, try the linux driver forum:

[url]https://devtalk.nvidia.com/default/board/98/linux/[/url]

Also, you should check to see if the behavior still happens with the latest driver, which is a 390.xx driver, currently.

[url]Linux x64 (AMD64/EM64T) Display Driver | 390.25 | Linux 64-bit | NVIDIA

max.wittal · February 11, 2018, 11:05pm

I used to have a 390.xx driver installed but CUDA 9.1 removed it and reinstalled the 387.xx driver.

Anyways could you move this thread to Linux - NVIDIA Developer Forums then?

Thanks

Max

Robert_Crovella · February 11, 2018, 11:18pm

You can certainly have a 390.xx driver installed with CUDA 9.1, but currently some install methods (e.g. deb local) may install a 387.xx driver.

njuffa · February 11, 2018, 11:20pm

Each CUDA version comes packaged with the minimum driver version required to run it. If you already have a newer driver installed, you can skip the driver installation portion of the CUDA installation process.

max.wittal · February 12, 2018, 12:49am

Yeah the problem looks like this:

$ sudo apt-get update 
Get:1 file:/var/cuda-repo-9-1-local  InRelease
Ign:1 file:/var/cuda-repo-9-1-local  InRelease
Get:2 file:/var/cuda-repo-9-1-local  Release [574 B]
Get:2 file:/var/cuda-repo-9-1-local  Release [574 B]

$ sudo apt-get install cuda nvidia-390
Reading package lists... Done
Building dependency tree       
Reading state information... Done
cuda is already the newest version (9.1.85-1).
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 cuda : Depends: cuda-9-1 (>= 9.1.85) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

chuang.jon · January 11, 2020, 6:37pm

I am experiencing the same issue after several segmentation faults and kernel launch failures. I have to restart to kill the UVM runtime.

scott.hawley · October 31, 2024, 12:20am

I realize this is a much-later reply, but this is the only thread on the internet that I’ve been able to find dealing with the same issue I’m having, namely that “UVM GPU1 BH” is hogging 100% of the GPU and nothing works anymore.
Is there any recommended fix for this?

I got in this problem by pressing ^C while a python process was running that was using the GPU. I had no idea that my otherwise-fine system would hang.

So far I’ve tried:

killing the ‘UVM GPU1 BH’ process, even as root, with a -9, but it won’t die.
trying sudo rmmod nvidia_uvm nvidia_drm nvidia_modeset nvidia but they’re all in use.
sudo systemctl restart nvidia-persistenced, but it’s masked. Unmasking it works but stopping & restarting it has no effect.

I rather not simply reboot the machine because it’s far away and I don’t have on-site access – if it doesn’t come back up, then I’ll be even more stuck.

Any other recommended fixes for this?

Topic		Replies	Views
Nvidia driver conflict CUDA_ERROR_NO_DEVICE Linux	10	9752	June 28, 2018
BUG: nvidia_uvm needs to be removed and re-inserted in order to work after wakeup from suspend Linux driver	22	6819	November 27, 2024
11 GB of GPU RAM used, and no process listed by nvidia-smi CUDA Programming and Performance	17	145205	September 22, 2023
410.66 crash and system freeze under heavy load (Xid 8, Xid 38) Linux	13	1960	November 15, 2018
Nvidia-powerd using lots of cpu on fedora 35 running gnome on wayland Linux	8	3150	December 29, 2022
Problems with CUDA 9.1 in Ubuntu 16.04 CUDA Setup and Installation	36	24291	May 15, 2018
GPU not being used/engaged after installing NVIDIA Drivers Linux	3	182	April 2, 2025
10 GB of GPU RAM used, and no process listed by nvidia-smi CUDA Setup and Installation cuda , nvbugs , pytorch	1	3165	June 15, 2023
GPU Hangs When Using OpenCV on the Jetson TX-1 Jetson TX1	13	1876	October 18, 2021
nvidia-smi Volatile GPU-Util 100%, always, reboot operating system can not fix CUDA Setup and Installation	6	11226	November 30, 2020

UVM GPU1 BH process causing 100% CPU after standby

Related topics