Yolov11 Triton Inference Server Deployment Problem

Title: Engine Deserialization Error During Deployment on Triton Inference Server

Description:
I am encountering an issue while deploying a YOLOv11 model on Triton Inference Server. The model was successfully converted to a TensorRT engine and performed inference correctly using the YOLO command-line interface. However, when deploying the model on Triton Inference Server, I receive the following error:

ERROR: 1: [stdArchiveReader.cpp::StdArchiveReader::32] Error Code 1: Serialization (Serialization assertion magicTagRead == kMAGIC_TAG failed.Magic tag does not match)
ERROR: 4: [runtime.cpp::deserializeCudaEngine::66] Error Code 4: Internal Error (Engine deserialization failed.)

Environment Details:

  • Operating System: Ubuntu 20.04
  • GPUs: NVIDIA V100 and NVIDIA RTX 3080 (tested on both)
  • CUDA Version: 11.7
  • TensorRT Versions Tested: 8.4.3.1, 8.2.0.5
  • Triton Server Versions Tested: 22.06, 24.11
  • Pytorch Versions Tested: 1.10.1, 2.0.0
  • Nvidia Driver Versions Tested: 515.105.01

Steps to Reproduce:

  1. Convert the YOLOv11 model to a TensorRT engine using the following command:

    $ yolo export model=yolov11.pt format=engine simplify
    

    The conversion completes without any issues!

  2. Test the generated engine locally with the following command:

    $ yolo predict weights=yolov11m.engine source=test.jpg
    

    The engine performs inference successfully without any errors!

  3. Deploy the engine on Triton Inference Server. During deployment, the above error is encountered.

Any specific compatibility requirements or configuration steps needed to resolve this issue.

Thank you for your assistance!

Hi @ahmetselimdemirel ,
TRT models have strict requirement about the TRT version that the engine was created on and the TRT version used to load the model. You need to make sure that the TRT model was generated on the same GPU and uses the same TRT version.
Can you pls confirm that?

Thanks

This appears to be a TRITON issue and i recommend you reaching out to the resp forum

You can follow the Ultralytics Triton Inference Server guide.

You can’t use the TensorRT exported like that because of TRT mismatch. But you can use the ONNX exported model with TensorRT backend as shown in the guide.

Otherwise you need to download the TensorRT docker image that’s the same version as the Triton docker and then export inside that.