$ nvidia-smi --version
NVIDIA-SMI version : 550.90.07
NVML version : 550.90
DRIVER version : 550.90.07
CUDA Version : 12.4
using nvidia-smi
shows incorrect msgs, I screenshot for ID=5 info, which the process already finished. Even I reboot the cluster, nvidia-smi
still show this messages.
and the PIDs are non-exists
| 5 N/A N/A 198823 C ...yu/miniconda3/envs/maker/bin/python 3772MiB |
+-----------------------------------------------------------------------------------------+
(py11) ~ @a800 (14:06:32)
$ sudo kill -9 198823
kill: (198823): No such process
but the nvidia-smi -i 5
shows correct information
$ nvidia-smi -i 5
Wed Dec 25 13:55:17 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 5 NVIDIA A800-SXM4-80GB On | 00000000:8F:00.0 Off | 0 |
| N/A 67C P0 397W / 400W | 2387MiB / 81920MiB | 100% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
and gpustat
shows correct informartion as well.
(py11) ~ @a800 (13:55:17)
$ gpustat
g0041 Wed Dec 25 13:55:35 2024 550.90.07
[0] NVIDIA A800-SXM4-80GB | 77°C, 100 % | 2926 / 81920 MB | root(536M)
[1] NVIDIA A800-SXM4-80GB | 67°C, 100 % | 2386 / 81920 MB |
[2] NVIDIA A800-SXM4-80GB | 72°C, 100 % | 2386 / 81920 MB |
[3] NVIDIA A800-SXM4-80GB | 70°C, 100 % | 2386 / 81920 MB |
[4] NVIDIA A800-SXM4-80GB | 69°C, 100 % | 2386 / 81920 MB |
[5] NVIDIA A800-SXM4-80GB | 68°C, 100 % | 2386 / 81920 MB |
[6] NVIDIA A800-SXM4-80GB | 62°C, 100 % | 37652 / 81920 MB | root(35262M)
[7] NVIDIA A800-SXM4-80GB | 75°C, 100 % | 2386 / 81920 MB |