Tensorrt engine file generated by TLT is not acceptable to inference server

Description

1: Trained the yolo3 (packages with TLT) with TLT (nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3)
2: converted the converted the yolo_resnet18_epoch_080.etlt to trt.engine file (all using commands in notebook)
3: renamed trt.engine to model.plan file and moved into tensorrt inference server (command and output below):

nvidia-docker run --gpus=1 --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v/home/infer_models/test_model_repository:/models -e LD_PRELOAD=“/preloadlibs/libnvinfer_plugin.so.7.0.0.1 /preloadlibs/libnvds_infercustomparser_yolov3_tlt.so” nvcr.io/nvidia/tritonserver:20.03-py3 trtserver --model-repository=/models --strict-model-config=false

=============================
== Triton Inference Server ==

NVIDIA Release 20.03 (build 11042949)

Copyright (c) 2018-2019, NVIDIA CORPORATION. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

2020-08-15 15:03:24.627236: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
I0815 15:03:24.657388 1 metrics.cc:164] found 1 GPUs supporting NVML metrics
I0815 15:03:24.663151 1 metrics.cc:173] GPU 0: GeForce RTX 2080 Ti
I0815 15:03:24.663566 1 server.cc:120] Initializing Triton Inference Server
E0815 15:03:26.924412 1 model_repository_manager.cc:1519] model output must specify ‘dims’ for yolov3_resnet18
error: creating server: INTERNAL - failed to load all models

As per link:

No model configuration file is required for tensorrt engine files. Inference server must be able to read all necessary information from the file itself. However this error suggests that either tensorrt is not producing required information or inference server is not reading.

Environment

TensorRT Version: 7.0 (same under TLT container and on host Ubuntu 18.04 system)
GPU Type: RTX 2080Ti
Nvidia Driver Version: 450.57
CUDA Version: 10.2 (same under TLT container and on host Ubuntu 18.04 system)
CUDNN Version: 7.6.5 (TLT container environment), 7.6.3 (Host Ubuntu 18.04 system )
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

etlt, trt.engine, libnvinfer_plugin, libinfercustomparser are in following folder:

https://drive.google.com/drive/folders/1DkZjYIu1TmUZAuqIZZuiBrtS9a6SDWyw?usp=sharing

Steps To Reproduce

-no codes changes are made inside TLT (default commands were run to train and convert)
-the command to run inference server is in description section along with errors

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi @ghazni,
Please allow me some time to check on this.
Thanks!

Hi @ghazni,
Your issue doesnt look like a TRT issue, but more of inference layer, hence would request you to raise it on the respective forum to get better assistance.

Thanks!

Many thanks for confirming. Sure will create a new topic in Inference Server forum.