UVM GPU1 BH process causing 100% CPU after standby

Hi,

I’ve noticed a consistent problem where after the system comes back from a standby this GPU process (which I guess is related to unified memory?) will just keep burning my CPU:

Tasks: 290 total,   2 running, 288 sleeping,   0 stopped,   0 zombie
%Cpu(s): 13.6 us,  8.8 sy,  0.0 ni, 77.4 id,  0.1 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem : 32648460 total,  1204044 free,  6101424 used, 25342992 buff/cache
KiB Swap: 33259004 total, 33259004 free,        0 used. 25873308 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND         
31363 root      20   0       0      0      0 R 100.0  0.0  35:10.91 UVM GPU1 BH     
    1 root      20   0  185340   5464   3480 S   0.0  0.0   0:02.06 systemd         
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.03 kthreadd
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 387.34                 Driver Version: 387.34                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P4000        Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   44C    P8     5W /  N/A |   1180MiB /  8114MiB |     22%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1235      G   /usr/lib/xorg/Xorg                           778MiB |
|    0      2250      G   compiz                                       129MiB |
|    0     17896      G   /proc/self/exe                               259MiB |
+-----------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

When trying to run my application after a standby I also get the error:

cudaMallocManaged(): all CUDA-capable devices are busy or unavailable

Only a restart helps, which is very annoying.

How do I attach the “nvidia-bug-report.log.gz”?

Thank you.

Max

If you’re trying to report a driver issue, the best place is at http://developer.nvidia.com in the bug reporting area. You will need to be signed up as a registered developer.

For general discussion around Linux driver issues, try the linux driver forum:

https://devtalk.nvidia.com/default/board/98/linux/

Also, you should check to see if the behavior still happens with the latest driver, which is a 390.xx driver, currently.

http://www.nvidia.com/download/driverResults.aspx/130646/en-us

I used to have a 390.xx driver installed but CUDA 9.1 removed it and reinstalled the 387.xx driver.

Anyways could you move this thread to https://devtalk.nvidia.com/default/board/98/linux/ then?

Thanks

Max

You can certainly have a 390.xx driver installed with CUDA 9.1, but currently some install methods (e.g. deb local) may install a 387.xx driver.

Each CUDA version comes packaged with the minimum driver version required to run it. If you already have a newer driver installed, you can skip the driver installation portion of the CUDA installation process.

Yeah the problem looks like this:

$ sudo apt-get update 
Get:1 file:/var/cuda-repo-9-1-local  InRelease
Ign:1 file:/var/cuda-repo-9-1-local  InRelease
Get:2 file:/var/cuda-repo-9-1-local  Release [574 B]
Get:2 file:/var/cuda-repo-9-1-local  Release [574 B]

$ sudo apt-get install cuda nvidia-390
Reading package lists... Done
Building dependency tree       
Reading state information... Done
cuda is already the newest version (9.1.85-1).
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 cuda : Depends: cuda-9-1 (>= 9.1.85) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

I am experiencing the same issue after several segmentation faults and kernel launch failures. I have to restart to kill the UVM runtime.