Model tensor shape configuration hints for dynamic batching but the underlying engine doesn't support batching

Hi all,

I have a model which can be deployed on TensorRT inference server, and do inference on client.
I know there are two kinds of different TRTIS sources from NGC. One is TRT version higher than 20, and the other is TRT version lower 20 (e.g., 19.10).

So my successful version is using this image version nvcr.io/nvidia/tensorrtserver:19.10-py3.
But this one tensorrt is using version 6, so I was trying to build on Tensorrt 7. (e.g., Triton 20.03 or newer Triton 20.08)

I was mainly using the onnx2trt converting tool to convert model from onnx to tensorrt engine in this container which was built from nvcr.io/nvidia/tensorrt:20.03.1-py3 and nvcr.io/nvidia/tensorrt:20.08-py3 (I also tried by trtexec.)

onnx2trt model.onnx -o model.trt -b 1
----------------------------------------------------------------
Input filename:   model.onnx
ONNX IR version:  0.0.6
Opset version:    11
Producer name:    pytorch
Producer version: 1.5
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
Parsing model
Building TensorRT engine, FP16 available:1
    Max batch size:     1
    Max workspace size: 1024 MiB
Writing TensorRT engine to model.trt
All done

Or

trtexec --onnx=model.onnx --saveEngine=model.trt --allowGPUFallback --workspace=3072 --explicitBatch

I used the same config.pbtxt file of TRTIS requesting to lunch Triton server on both versions of images. nvcr.io/nvidia/tritonserver:20.03.1-py3 and nvcr.io/nvidia/tritonserver:20.08-py3
However, I always got this error messages below that this error didn’t happen at old imgae version nvcr.io/nvidia/tensorrtserver:19.10-py3.

 The TRT engine doesn't specify appropriate dimensions to support dynamic batching
E0902 08:49:03.482851 1 model_repository_manager.cc:1633] unable to autofill for 'trt_model', model tensor  shape configuration hints for dynamic batching but the underlying engine doesn't support batching.

BTW I can implement this TensorRT engine on this image without using server. (To be precise, I build TRT engine in those containers which were based on nvcr.io/nvidia/tensorrt:20.03.1-py3 and nvcr.io/nvidia/tensorrt:20.08-py3 images.)

Here is the content in my config.pbtxt file.

name: "trt_model"
platform: "tensorrt_plan"
max_batch_size: 1

input [
  {
    name: "input.1"
    data_type: TYPE_FP32
    dims: [ 3, 512, 512 ] 
  }
]

output [
  {
    name: "520"
    data_type: TYPE_FP32
    dims: [ 2, 128, 128 ]
  },
  {
    name: "523"
    data_type: TYPE_FP32
    dims: [ 2, 128, 128 ]
  },
    {
    name: "526"
    data_type: TYPE_FP32
    dims: [ 10, 128, 128 ]
  },
    {
    name: "529"
    data_type: TYPE_FP32
    dims: [ 2, 128, 128 ]
  },
    {
    name: "532"
    data_type: TYPE_FP32
    dims: [ 5, 128, 128 ]
  },
    {
    name: "535"
    data_type: TYPE_FP32
    dims: [ 2, 128, 128 ]
  }
]

instance_group [
  {
    count: 2
    kind: KIND_GPU
  }
]

Is there any idea about this?

Thank you.

Env info

Successfully case was tried on GTX 1060
Failure case was tried on Titan V

All experiments were tried on docker.

If you need to further info, please let me know.

Thank you so much

Best regards,
Chieh

Solved.

Solution:
Dont use

--strict-model-config=false