nvidia-smi gives varying output

top500 · January 13, 2016, 12:08am

Every time I run nvidia-smi to check on system status, I get a different output:

-sh-4.1$ nvidia-smi
Failed to initialize NVML: Unknown Error

-sh-4.1$ nvidia-smi
Unable to determine the device handle for GPU 0000:04:00.0: The NVIDIA kernel
module detected an issue with GPU interrupts.Consult the "Common Problems"
Chapter of the NVIDIA Driver README for
details and steps that can be taken to resolve this issue.

-sh-4.1$ nvidia-smi
Tue Jan 12 16:03:49 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.63     Driver Version: 352.63         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K20c          Off  | 0000:04:00.0     Off |                    0 |
| 30%   32C    P0    49W / 225W |     12MiB /  4799MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K20c          Off  | 0000:84:00.0     Off |                    0 |
| 30%   35C    P0    53W / 225W |     12MiB /  4799MiB |     95%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Couple of questions;

How do I diagnose this inconsistent output?
When nvidia-smi does detect my GPUs, why is the volatile GPU-util 95% in the second GPU although there are not running processes? This always happens to the 2nd GPU.

jageshmaharjan · August 7, 2018, 9:21am

I encountered when I was in the Docker container.

Topic		Replies	Views
Nvidia-smi error after a few minutes of up time Linux	3	522	January 23, 2023
nvidia-smi GPU-Util abnormal CUDA Setup and Installation	2	2021	August 15, 2014
nvidia-smi reports phantom utilization reported on one GPU Linux	0	528	June 22, 2018
GPU not detected by nvidia-smi Linux	0	246	July 31, 2024
Unable to determine the device handle for GPU 0000:01:00.0: Unknown Error Linux kernel , nvidia-smi	7	4318	September 8, 2022
Inconsistent GPU utilization returned by nvidia-smi System Management and Monitoring (NVML)	0	1370	July 2, 2020
GPU utilization broken in CUDA-4.0 Is patch available? CUDA Programming and Performance	2	2882	August 8, 2011
Nvidia-SMI reporting 0% gpu utilization Drivers - Linux, Windows, MacOS linux , nvidia-smi , linux-driver	2	4781	August 3, 2023
`nvidia-smi -q` shows several "Unknown Error"; GPU ignored by pytorch Linux cuda , ubuntu	3	2143	September 6, 2023
Unable to determine the device handle for GPU 0000:21:00.0: Unknown Error Linux ubuntu , driver	15	16156	February 4, 2025

nvidia-smi gives varying output

Related topics