Memory exceeded error when running triton-inference-server

After performing image inference repeatedly on triton-inference-server, a memory overflow error occurred and image inference could not be performed.
We checked the GPU memory status when the memory overflow error occurred and found that GPU memory had not been released.
We would like to know the cause of this and what we can do about it.

The state of GPU memory on the host server when an out-of-memory error occurs is as follows

Tue Jan 31 11:52:23 2023       
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA RTX A4500    On   | 00000000:08:00.0 Off |                  Off |
| 30%   46C    P8    14W / 200W |  19967MiB / 20470MiB |      0%      Default |
|                               |                      |                  N/A |                                                                               
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|    0   N/A  N/A   1698238      C   tritonserver                    17816MiB |
|    0   N/A  N/A   1698683      C   ...riton_python_backend_stub      724MiB |
|    0   N/A  N/A   1698793      C   ...riton_python_backend_stub      724MiB |
|    0   N/A  N/A   1698903      C   ...riton_python_backend_stub      700MiB |

GPU memory details are as follows

hironoriinui@a4001:~lsof /dev/nvidia*a*
lsof: WARNING: can't stat() nsfs file system /run/docker/netns/default
      Output information may be incomplete.
lsof: WARNING: can't stat() overlay file system /var/lib/docker/overlay2/d929121d8268af614347d0048b0b6379f8e49820d9ed87a1383b64437c3e1303/merged
      Output information may be incomplete.
lsof: WARNING: can't stat() overlay file system /var/lib/docker/overlay2/61484018784f80c010f914cabec06b0998b9dc6a9539bc0402907e64d2689871/merged
      Output information may be incomplete.
lsof: WARNING: can't stat() nsfs file system /run/docker/netns/46f7b259f212
      Output information may be incomplete.

The execution environment is as follows

Triton-inference-Server Container Version 22.11
CUDA Version: 11.8