Hi,
I am building a Dockerized Triton inference server for a TensorRT model.
Base Image - nvcr.io/nvidia/tritonserver:24.02-py3 or nvcr.io/nvidia/tritonserver:23.02-py3
For model I have .engine file which is made with the help of ultralytics.
I am using the g4dn.xlarge instance for this process.
Issues I Am Facing:
Container Runs, But API Doesn’t Respond which means triton server doesn’t respond because of which inference script fails as well.
Here is my config.pbtxt:
name: "yolov8_tensorrt"
platform: "tensorrt_plan"
default_model_filename: "bestEHS.engine"
max_batch_size: 4
input [
{
name: "images"
data_type: TYPE_FP32
dims: [3, 640, 640]
}
]
output [
{
name: "output0"
data_type: TYPE_FP32
dims: [84, 8400]
}
]
instance_group [
{
kind: KIND_GPU
}
]
Folder structure:
yolov8-triton-tensorrt/
│── models/
│ ├── yolov8_tensorrt/
│ │ ├── 1/
│ │ │ ├── bestEHS.engine
│ │ ├── config.pbtxt
├── bus.jpg
├── Dockerfile
Error of version and triton server initialization:
I have tried different ways as well but a lot of them had only the problem of triton server can’t be initialized, or it is not up or running.