Performance degradation deploying Retinanet (resnet18) with DS-Triton

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) : T4
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) : Pre-trainet TAO Model Retinanet=Resnet18 backbone
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here):

  • TLT 3.0
  • Model downloaded from here
  • Docker image: deepstream:6.0-ea-21.06-triton

• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

Hi all, I am seeing performance degradation when I deploy the optimized model retinanet-resnet18 in INT8 precision mode with DeepStream-Triton, see below:

Reference baseline inference with trtexec:
throughput: 306.362 qps

DeepStream only:
**PERF: 320.73 (294.19)

DeepStream-Triton integration with strict_model_config: true
**PERF: 233.16 (200.13)

DeepStream-Triton integration with strict_model_config: false
**PERF: 319.06 (129.13)


For 1) DeepStream-Triton integration with strict_model_config: false there is a huge difference between the number in brackets and average of the most recent five seconds **PERF: 319.06 (129.13). So, what could be the reason for this big gap in those performance numbers?, and which one should be considered as the measurement performance? , the number inside or outside of the brackets?
2- What should be the correct configuration to be setup in the below config.pbtxt file when using strict_model_config: true?
3- How can I recover the generated config.pbtxtx file when running the test with DeepsTream-Triton integration and the flag strict_model_config: false ?

config.pbtxt with strict_model_config: true:

name: "retinanet_resnet18"
platform: "tensorrt_plan"
default_model_filename: "retinanet_resnet18.etlt_b1_gpu0_int8.engine"
max_batch_size: 1
input [
    name: "Input"
    format: FORMAT_NCHW
    data_type: TYPE_FP32
    dims: [ 3, 544, 960 ]
output [
    name: "NMS"
    data_type: TYPE_FP32
    dims: [ 1, 250, 7 ]
    name: "NMS_1"
    data_type: TYPE_FP32
    dims: [ 1, 1, 1 ]  

instance_group {
  count: 1
  gpus: 0
  kind: KIND_GPU