Triton server GPU memory leak for grpc cuda shared memory request

Our service uses triton server and it shows gpu memory leak over time.
Triton server uses the image, tritonserver:23.11-py3.

Our service process uses grpc cuda shared memory to send requests to the triton server.
cudaMalloc() is done once when initialized, and we reuse the memory when sending and receiving requests.

You can see a lot of increase in GRAM during evening hours when requests are concentrated.

I want to resolve this issue, but don’t know what to do.
Prometheus metric is not helpful also.

# HELP nv_gpu_memory_used_bytes GPU used memory, in bytes
# TYPE nv_gpu_memory_used_bytes gauge
nv_gpu_memory_used_bytes{gpu_uuid=“GPU-964c5806-b17b-4615-ba02-453bc6599627”} 21515730944

Any suggestions would be very helpful :)