I am trying to run the trtis on local pc 2070, with 2 tensorRT plan (yolo)
while testing the throughput of trtis, the memory raise rapidly.
Even i shut down the client container, the DRAM stuck in 10GB out of 16GB
- trtis off: 3.6GB
- trtis on: 6.8 GB
- after the 10requests: 16GB+2GB swap
if i keep sending the request and receive the result, the server DRAM still goes up and eat up all the swap.
but VRAM keep the same with 6GB out of 8GB.
1). Is there any configuration i have to set inside config.pbtxt that release memory/cache?
2).another question, how can i know the trtis cpu usage and throughput of each model (since there are network latency on sending data)?