There was a problem with the call nvidia-smi.
To monitor the load on the GPU M60 I use nvidia-smi and read the required values.
The GPU is used to transcode video using ffmpeg.
The command of the nvidia-smi is performed every second, and with a sufficiently low load on the GPU ~ 15%, the response time of the request increases to 4.236s
In this case, requests are not blocked and accumulates a lot of processes that in turn cause problems with the GPU.
Team example:
nvidia-smi -q -d UTILIZATION, MEMORY
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.25 Driver Version: 390.25 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 On | 00000000:06:00.0 Off | Off |
| 32% 49C P0 46W / 120W | 977MiB / 8129MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M60 On | 00000000:07:00.0 Off | Off |
| 32% 38C P0 46W / 120W | 814MiB / 8129MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M60 On | 00000000:86:00.0 Off | Off |
| 35% 52C P0 46W / 120W | 840MiB / 8129MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla M60 On | 00000000:87:00.0 Off | Off |
| 32% 43C P0 51W / 120W | 1336MiB / 8129MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
How often can I request a load data, or how can I optimize the command for faster interaction.