Nvidia-smi really slow to execute

The command nvidia-smi has recently been really slow to execute. In this example, it takes 1m45 to run:

# time nvidia-smi
Thu Jan  7 17:48:46 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:18:00.0 Off |                    0 |
| N/A   68C    P0   167W / 300W |  30578MiB / 32510MiB |     59%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   50C    P0   108W / 300W |  30576MiB / 32510MiB |     64%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  Off  | 00000000:86:00.0 Off |                    0 |
| N/A   38C    P0    65W / 300W |      0MiB / 32510MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  Off  | 00000000:AF:00.0 Off |                    0 |
| N/A   59C    P0    89W / 300W |  22868MiB / 32510MiB |     51%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     12808      C   python3                                    30565MiB |
|    1     12023      C   python3                                    30563MiB |
|    3     32012      C   python3                                    22855MiB |
+-----------------------------------------------------------------------------+

real    1m45.791s
user    0m0.000s
sys     1m45.736s

Since this environment is relatively new to us, I’d like to get a pointer on how to start debugging this issue.

I attached the output of the nvidia-bug-report.sh.

nvidia-bug-report.log.gz (3.5 MB)

Thanks in advance for any help.

Emmanuel

You’ll need to have the persistence daemon (nvidia-persistenced) started on boot and have it continuously running. Otherwise, the driver gets unloaded and the Teslas deinit so when nvidia-smi is run it needs a full reinit. Failing to have the persistenced running may also lead to more serious issue like gpus crashing, depending on workloads.

Addendum: if despite running nvidia-persistenced nvidia-smi is still slow, you might use the forum search, there was a bug with certain driver versions once, there’s a thread about it.

I encountered a very similar issue, and my initial approach was to reinstall the NVIDIA compute utilities using the command:

sudo apt install --reinstall nvidia-compute-utils-xxx

where ‘xxx’ represents the version number of your NVIDIA driver. However, this method is unfortunately not a permanent solution as the issue recurs after every reboot, necessitating repeated reinstallation.

2 Likes

Your computer is slow bro,

Don’t worry I got you…

first ask the kernel how fast your cpu is running

(as root)

cat /proc/cpuinfo | grep MHz

and if your cpu’s are not all maxxed out and bangin’
your computer is running slow, and you should look into enabling the performance governor for your unique system

also disable pretty desktop effects, compositing, turn down your virtual desktops to 1, and consider disabling text anti-aliasing for maximum power boostin’

also check top for ram consumption

top -d 5

and turn off services you don’t need

systemctl --disable --now everything-you-don’t-use