Triton Inference Server

mansir · June 20, 2023, 10:08am

Hi,

I am running a Video analytics pipeline, where multiple cameras are sending requests to the Yolox_s model. It is using tensorrt c++ for getting the inference currently. I want to adapt to triton.

It is a pytorch model. I want to deploy this model as tensorrt model, hence inside the NGC Container first I will be converting the onnx model to tensorrt engine file and configure config.pbtxt file where I can specify how many instances of the model we can initiate.

How to route the client request to the model instances. Does it automatically get referred to the instance or we need to define it while sending requests

Also, please suggest some reference for sending client side request in c++.

spolisetty · June 21, 2023, 12:29pm

Hi,

Please refer to the following documentation.

Thank you.

Topic		Replies	Views
TensorRT Triton server for multiple model instances DRIVE AGX Orin General driveos-dl	1	729	December 1, 2022
Mask RCNN TensorRT in Triton TensorRT	3	677	July 9, 2020
How to run a custom yolov5 model in triton inference server Triton Inference Server (archived) aws , inference-server-triton	0	1253	June 10, 2021
Creation of Triton inference server for yolov8 tensorrt model endpoint on Sagemaker Computer Vision & Image Processing	0	115	April 1, 2025
Optimizing and Serving Models with NVIDIA TensorRT and NVIDIA Triton Technical Blog	1	461	July 20, 2022
How to run a tao yolov4 model in triton inference server AI for Media inference-server-triton , client	0	462	September 14, 2023
Deepstream nvinfer-server with https endpoint TensorRT	1	407	August 30, 2021
Support for PyTorch Triton Inference Server (archived)	1	580	April 15, 2021
Yolo model in a Triton server? NGC GPU Cloud	0	922	February 16, 2022
How to deploy Yolov5 on Nvidia Triton via Jetson Xavier NX Jetson Xavier NX tensorrt , inference-server-triton	1	1411	January 5, 2022

Triton Inference Server

Related topics