Triton Inference Server


I am running a Video analytics pipeline, where multiple cameras are sending requests to the Yolox_s model. It is using tensorrt c++ for getting the inference currently. I want to adapt to triton.

It is a pytorch model. I want to deploy this model as tensorrt model, hence inside the NGC Container first I will be converting the onnx model to tensorrt engine file and configure config.pbtxt file where I can specify how many instances of the model we can initiate.

How to route the client request to the model instances. Does it automatically get referred to the instance or we need to define it while sending requests

Also, please suggest some reference for sending client side request in c++.


Please refer to the following documentation.

Thank you.