I am running TensorRT Inference Server v19.05 locally using the nvidia docker container from NGC with the following command.
nvidia-docker run --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p9000:8001 -p8002:8002 -v /path/to/model_repository:/models nvcr.io/nvidia/tensorrtserver:19.05-py3 trtserver --model-store=/models
I am using the example model repository provided by nvidia. When I make continuous inference requests via grpc to the resnet50_netdef model the system ram usage of the trtserver process continues to rise until it reaches almost 100% and the OS closes it.
How do I stop the RAM usage from continuously climbing?
My system specs follow:
Operating system: Ubuntu 18.04
RAM: 32GB
Docker version: Docker version 18.09.6, build 481bc77