Using TensorRT Inference Server with TLT models

Hello,

I am posting this here because I am not sure if this is a TLT question or inference server question…

To test out TensorRT Inference Server, I trained a quick Resnet50 Classification model with TLT.

Everything worked out great, so I exported it and then converted it into a trt model on my x86 machine with the exporter from the docker container.

The converter creats a .trt model
TensorRT server expects a .plan

How can I get the server to load a .trt or get TLT to create a .plan?

Thank you!

For more information,

TLT Docker = tlt-streamanalytics:v1.0_py2

TRTIS Docker = tensorrtserver:19.10-py3

For TRT models the easiest way to get the correct model configuration for TRTIS is to not provide a config.pbtxt and instead use --strict-model-config=false. See https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#generated-model-configuration

If I do the following command:

sudo docker run --gpus all --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -v/tensorrt-inference-server/docs/examples/model_repo_test:/tmp/models nvcr.io/nvidia/tensorrtserver:19.10-py3 /opt/tensorrtserver/bin/trtserver --model-store=/tmp/models --strict-model-config=false

I get the error:

~/ai/tensorrt-inference-server$ sudo docker run --gpus all --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -v/home/tensorrt-inference-server/docs/examples/model_repo_test:/tmp/models nvcr.io/nvidia/tensorrtserver:19.10-py3 /opt/tensorrtserver/bin/trtserver --model-store=/tmp/models --strict-model-config=false 

===============================
== TensorRT Inference Server ==
===============================

NVIDIA Release 19.10 (build 8266503)

Copyright (c) 2018-2019, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

I0313 17:56:15.270401 1 metrics.cc:160] found 1 GPUs supporting NVML metrics
I0313 17:56:15.276123 1 metrics.cc:169]   GPU 0: GeForce RTX 2080
I0313 17:56:15.276324 1 server.cc:110] Initializing TensorRT Inference Server
E0313 17:56:17.103776 1 logging.cc:43] ../rtSafe/coreReadArchive.cpp (31) - Serialization Error in verifyHeader: 0 (Magic tag does not match)
E0313 17:56:17.103841 1 logging.cc:43] INVALID_STATE: std::exception
E0313 17:56:17.103847 1 logging.cc:43] INVALID_CONFIG: Deserialize the cuda engine failed.
E0313 17:56:17.110454 1 model_repository_manager.cc:1453] must specify platform for model 'resnet50_steel'
E0313 17:56:17.110505 1 main.cc:1099] error: creating server: INTERNAL - failed to load all models

Do you know why I cant generate this config file properly?

Hi martin,
The tlt-export will generate an etlt model.
The tlt-converter will generate a trt engine. That is exactly the trt plan file.
From your comment, “exported it and then converted it into a trt model on my x86 machine”, did you convert it via tlt-converter?

More info, please see TRT engine deployment

Hi Morganh,

Yes I used the tlt-converter on my desktop computer to create the tensorRT .trt model.

Still does not work with TRTIS

Hi Martin,

You have to make sure that TLT and TRTIS use same version TRT.
In your setting,
TLT Docker = tlt-streamanalytics:v1.0_py2: TRT 5.1.5
TRTIS Docker = tensorrtserver:19.10-py3: TRT: 6.0.1
In that case, TRTIS will not be able to recognize TRT engine,

tensorrtserver:19.08-py3 with should be able to work with your TLT TRT engine.

Good luck!

@andrliu
Great! Thank you for the update. Could you please provide me a config file that would work with loading my TLT trained model with TensorRT Server?

That way I know I am doing it correctly.

Many thanks,
Martin