Yolov11 Triton Inference Server Deployment Problem

Title: Engine Deserialization Error During Deployment on Triton Inference Server

Description:
I am encountering an issue while deploying a YOLOv11 model on Triton Inference Server. The model was successfully converted to a TensorRT engine and performed inference correctly using the YOLO command-line interface. However, when deploying the model on Triton Inference Server, I receive the following error:

ERROR: 1: [stdArchiveReader.cpp::StdArchiveReader::32] Error Code 1: Serialization (Serialization assertion magicTagRead == kMAGIC_TAG failed.Magic tag does not match)
ERROR: 4: [runtime.cpp::deserializeCudaEngine::66] Error Code 4: Internal Error (Engine deserialization failed.)

Environment Details:

  • Operating System: Ubuntu 20.04
  • GPUs: NVIDIA V100 and NVIDIA RTX 3080 (tested on both)
  • CUDA Version: 11.7
  • TensorRT Versions Tested: 8.4.3.1, 8.2.0.5
  • Triton Server Versions Tested: 22.06, 24.11
  • Pytorch Versions Tested: 1.10.1, 2.0.0
  • Nvidia Driver Versions Tested: 515.105.01

Steps to Reproduce:

  1. Convert the YOLOv11 model to a TensorRT engine using the following command:

    $ yolo export model=yolov11.pt format=engine simplify
    

    The conversion completes without any issues!

  2. Test the generated engine locally with the following command:

    $ yolo predict weights=yolov11m.engine source=test.jpg
    

    The engine performs inference successfully without any errors!

  3. Deploy the engine on Triton Inference Server. During deployment, the above error is encountered.

Any specific compatibility requirements or configuration steps needed to resolve this issue.

Thank you for your assistance!