nvidia-smi shows last GPU K80 (out of 8) is always busy

Dear CUDA experts,

We have 14 GPU nodes, each node has 8 GPUs (K80). I’ve noticed that on empty GPU nodes “nvidia-smi” always shows non-zero GPU utilization for the last GPU device (7th that is, counting from 0 to 7). The number is usually >50% and keeps changing, however, I also see “No running processes found”. What does that mean, any ideas, is that normal? What is that GPU doing? Thank you in advance!

$ nvidia-smi
Fri Dec 15 16:30:22 2017
±----------------------------------------------------------------------------+
| NVIDIA-SMI 384.69 Driver Version: 384.69 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:04:00.0 Off | 0 |
| N/A 38C P0 57W / 149W | 0MiB / 11439MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla K80 Off | 00000000:05:00.0 Off | 0 |
| N/A 31C P0 73W / 149W | 0MiB / 11439MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla K80 Off | 00000000:84:00.0 Off | 0 |
| N/A 39C P0 61W / 149W | 0MiB / 11439MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 Tesla K80 Off | 00000000:85:00.0 Off | 0 |
| N/A 31C P0 72W / 149W | 0MiB / 11439MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 4 Tesla K80 Off | 00000000:8A:00.0 Off | 0 |
| N/A 29C P0 59W / 149W | 0MiB / 11439MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 5 Tesla K80 Off | 00000000:8B:00.0 Off | 0 |
| N/A 38C P0 76W / 149W | 0MiB / 11439MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 6 Tesla K80 Off | 00000000:8E:00.0 Off | 0 |
| N/A 30C P0 59W / 149W | 0MiB / 11439MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| [b]7 Tesla K80 Off | 00000000:8F:00.0 Off | 0 |
| N/A 39C P0 73W / 149W | 0MiB / 11439MiB | 64% Default |<–
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |<–
±----------------------------------------------------------------------------+

And this is steady state with the machine idling? I am asking because all GPUs are in power state P0 = “full power” (and elevated power draw because of that), as if this this snapshot had been taken at the end of a CUDA-accelerated app, after GPU activity ceased but before the app terminated (and power state dropped to a power-saving state like P8).

BTW, it is interesting to see that the odd-numbered GPU in each pair has higher power consumption and temperature. Weird.

With respect to the nvidia-smi reporting utilization percentage on one of the GPUs, this is normal behavior. The act of running nvidia-smi generates momentary utilization on one of the GPUs, typically.

Hi, Njuffa and Txbob:

Yes, this is a steady state with the node idling, i.e. absolutely NO APPLICATIONS running by users. “the odd-numbered GPU in each pair has higher power consumption and temperature”, - this is probably related to hardware configuration:

$ nvidia-smi topo -m
GPU 0 1 2 3 4 5 6 7 mlx4_0 CPU Affinity
0 X PIX SOC SOC SOC SOC SOC SOC SOC 0-5
1 PIX X SOC SOC SOC SOC SOC SOC SOC 0-5
2 SOC SOC X PIX PHB PHB PHB PHB PHB 6-11
3 SOC SOC PIX X PHB PHB PHB PHB PHB 6-11
4 SOC SOC PHB PHB X PIX PXB PXB PHB 6-11
5 SOC SOC PHB PHB PIX X PXB PXB PHB 6-11
6 SOC SOC PHB PHB PXB PXB X PIX PHB 6-11
7 SOC SOC PHB PHB PXB PXB PIX X PHB 6-11
mlx4_0 SOC SOC PHB PHB PHB PHB PHB PHB X

Legend:

X = Self
SOC = Connection traversing PCIe as well as the SMP link between CPU sockets(e.g. QPI)
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe switches (without traversing the PCIe Host Bridge)
PIX = Connection traversing a single PCIe switch
NV# = Connection traversing a bonded set of # NVLinks

Txbob,

Your answer reminded me of the Heisenberg’s Uncertainty principle: “every measurement necessarily has to disturb the quantum particle, which distorts the results of any further measurements” (:

That does make sense, though. However, we have also discovered that enabling the “Persistence Mode”
on all GPU devices of the node gets rid of the non-zero GPU utilization on the last GPU device, in other words, the last GPU device shows 0% utilization in that case. My question is, should we turn on “Persistence Mode” on all GPU nodes now and would it make any difference in regard to GPU device functionality/performance (isn’t this regime deprecated, http://docs.nvidia.com/deploy/driver-persistence/)? The problem is that I DO need all gpus to show %0 utilization for my script which greps for utilization from nvidia-smi and then reserves the free gpu devices.

Thank you so much for your answers, much appreciated!

That’s correct, persistence mode will eliminate this effect. Persistence mode keeps the GPU(s) in a fully active state whether they are being utilized or not. In this fully active state, the process of querying the GPU by nvidia-smi does not generate “Utilization”.

I personally would not build a GPU allocator that relies on this utilization percentage as an inferential measure of whether or not the GPU is in-use. Rather I would use a non-inferential method such as a job scheduler. If you require this particular methodology, however, then enabling persistence mode should help. There shouldn’t be any problems with enabling persistence mode, however GPU average power utilization will be higher in the idle state.

Thank you for the quick response, Txbob. The problem with Univa Grid Engine scheduler is that it does not discriminate b/w GPU devices, i.e., it won’t tell you which GPU devices are free or busy (which is required for running such MD packages as ACEMD and NAMD). Also, the scheduler can be easily confused when users declare 1 gpu but then use 8, for example. That happens quite often. Thank you so much for your feedback, I will let our technical lead know that we can turn on “persistence mode” for all GPU devices/nodes.

There are plenty of GPU-aware job schedulers/resource managers.

Univa claims to be one of these:

http://www.univa.com/resources/files/gpus.pdf

Perhaps you are not using it correctly.

Thank you for sending the link, Txbob. This is what we need to figure out!

That seems wrong. Best I know, other task scheduling facilities such as LSF control the available GPUs with CUDA_VISIBLE_DEVICES, so a scheduled task cannot grab more GPUs than they reserved or are entitled to per queue restrictions.