TensorRT Inference Server rejecting valid trt.engine file generated by TLT

Summary

TensorRT Inference Server is not accepting valid trt.engine file generated by TLT (confirmed by TensorRT support team ref: Tensorrt engine file generated by TLT is not acceptable to inference server)

I’ve been advised to describe issue in this forum.

Description

Note: I haven’t changed any code anywhere. Similarly none of the commands were altered. All commands are as per described in online guides or jupyter notebooks.

1: Trained the yolo3 (packages with TLT) with TLT (nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3)
2: converted the converted the yolo_resnet18_epoch_080.etlt to trt.engine file (using commands in jupyter notebook)
3: renamed trt.engine file to model.plan file and moved into tensorrt inference server model_repository (command and output below):

nvidia-docker run --gpus=1 --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v/home/infer_models/test_model_repository:/models -e LD_PRELOAD="/preloadlibs/libnvinfer_plugin.so.7.0.0.1 /preloadlibs/libnvds_infercustomparser_yolov3_tlt.so" nvcr.io/nvidia/tritonserver:20.03-py3 trtserver --model-repository=/models --strict-model-config=false

=============================

== Triton Inference Server ==

NVIDIA Release 20.03 (build 11042949)

Copyright © 2018-2019, NVIDIA CORPORATION. All rights reserved.

Various files include modifications © NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

2020-08-15 15:03:24.627236: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
I0815 15:03:24.657388 1 metrics.cc:164] found 1 GPUs supporting NVML metrics
I0815 15:03:24.663151 1 metrics.cc:173] GPU 0: GeForce RTX 2080 Ti
I0815 15:03:24.663566 1 server.cc:120] Initializing Triton Inference Server
E0815 15:03:26.924412 1 model_repository_manager.cc:1519] model output must specify ‘dims’ for yolov3_resnet18
error: creating server: INTERNAL - failed to load all models

As per link:
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/model_repository.html#section-tensorrt-models

No model configuration file is required for tensorrt engine files. Inference server must be able to read all necessary information from the file itself. Now that TensorRT support team has confirmed this is issue with inference layer (Tensorrt engine file generated by TLT is not acceptable to inference server).

Could you please advise what do i need to do to resolve this issue?

Environment

TensorRT Version : 7.0 (same under TLT container and on host Ubuntu 18.04 system)
GPU Type : RTX 2080Ti
Nvidia Driver Version : 450.57
CUDA Version : 10.2 (same under TLT container and on host Ubuntu 18.04 system)
CUDNN Version : 7.6.5 (TLT container environment), 7.6.3 (Host Ubuntu 18.04 system )
Operating System + Version : Ubuntu 18.04
Python Version (if applicable) : 3.6.9
TensorFlow Version (if applicable) :
PyTorch Version (if applicable) :
Baremetal or Container (if container which image + tag) :

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

etlt, trt.engine, libnvinfer_plugin, libinfercustomparser are in following folder:

https://drive.google.com/drive/folders/1DkZjYIu1TmUZAuqIZZuiBrtS9a6SDWyw?usp=sharing

Steps To Reproduce

-no codes changes are made inside TLT (default commands were run to train and convert)
-the command to run inference server is in description section along with errors

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered