how to release the TRTIS memory

I am trying to run the trtis on local pc 2070, with 2 tensorRT plan (yolo)
while testing the throughput of trtis, the memory raise rapidly.

Even i shut down the client container, the DRAM stuck in 10GB out of 16GB

  • trtis off: 3.6GB
  • trtis on: 6.8 GB
  • after the 10requests: 16GB+2GB swap

if i keep sending the request and receive the result, the server DRAM still goes up and eat up all the swap.

but VRAM keep the same with 6GB out of 8GB.

1). Is there any configuration i have to set inside config.pbtxt that release memory/cache?

2).another question, how can i know the trtis cpu usage and throughput of each model (since there are network latency on sending data)?

There is a known issue with the GRPC front-end. Are you using GRPC? Try using the 19.07 (or later) version of TRTIS and running with --grpc-infer-thread-count=16 and --grpc-stream-infer-thread-count=16.

[img]/home/guozhuoran/Pictures/Screenshot from 2019-10-23 12-47-41.png