After performing image inference repeatedly on triton-inference-server, a memory overflow error occurred and image inference could not be performed.
We checked the GPU memory status when the memory overflow error occurred and found that GPU memory had not been released.
We would like to know the cause of this and what we can do about it.
The state of GPU memory on the host server when an out-of-memory error occurs is as follows
Tue Jan 31 11:52:23 2023 | NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 | | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | | 0 NVIDIA RTX A4500 On | 00000000:08:00.0 Off | Off | | 30% 46C P8 14W / 200W | 19967MiB / 20470MiB | 0% Default | | | | N/A | | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | | 0 N/A N/A 1698238 C tritonserver 17816MiB | | 0 N/A N/A 1698683 C ...riton_python_backend_stub 724MiB | | 0 N/A N/A 1698793 C ...riton_python_backend_stub 724MiB | | 0 N/A N/A 1698903 C ...riton_python_backend_stub 700MiB |
GPU memory details are as follows
hironoriinui@a4001:~lsof /dev/nvidia*a* lsof: WARNING: can't stat() nsfs file system /run/docker/netns/default Output information may be incomplete. lsof: WARNING: can't stat() overlay file system /var/lib/docker/overlay2/d929121d8268af614347d0048b0b6379f8e49820d9ed87a1383b64437c3e1303/merged Output information may be incomplete. lsof: WARNING: can't stat() overlay file system /var/lib/docker/overlay2/61484018784f80c010f914cabec06b0998b9dc6a9539bc0402907e64d2689871/merged Output information may be incomplete. lsof: WARNING: can't stat() nsfs file system /run/docker/netns/46f7b259f212 Output information may be incomplete.
The execution environment is as follows
Triton-inference-Server Container Version 22.11
CUDA Version: 11.8