I am running inference server 1.12.0 by official container v20.02 in Ubuntu 16.04. The model served by it is a modified Yolo v3 object detection model.
My request rates is around ~10/sec using gRPC. It happened several times that the server quit randomly and no error message was presented. I wish there should be a way to log the error somewhere.
TRTIS always occupy about 10GB memory and 2.7GB GPU memory. My server has 32 GB memory and a GTX 2080. Could the problem be caused by memory error?