Generation of Triton Inference Server configuration for TensorRT exported model of TAO classification (resnet)

Hello,

I’m following the instructions from image classification tutorial with Resnet from the NVIDIA TAO CV samples on a custom dataset of six classes. The notebook and the site documentation describe how to export and deploy to Deepstream SDK but not how to export and deploy to Triton Inference Server.

The Triton documentation specifies that TensorRT models for Triton Inference Server must have this format:

<model-repository-path>/
    <model-name>/
        config.pbtxt
        1/
            model.plan

But after the procedure is completely executed the output consist of the following files:

!ls $TAO_EXPERIMENTS_DIR/classification/export
calibration.tensor  
final_model_int8_cache.bin
final_model.etlt    
final_model.trt

Questions:

  • How is config.pbtxt generated?
  • Is there an automatic tool or a tao tool to inspect an optimized model architecture to obtain the last layer name, format, etc.? Trying with option ‘–strict-model-config=false’ gives an error (details below).
  • Is it correct to copy and rename final_model.trt to model.plan to the Triton model repository?
  • Does Triton Server support final_model.etlt?
  • Any reference to documentation or tutorial is appreciated.

Details:

• Hardware: NVIDIA RTX3070MaxQ.
• Network Type (Classification)
• TLT Version: 3.22.02
• Triton Server Version: 22.05
• Training spec file:

model_config {
  arch: "resnet",
  n_layers: 18
  # Setting these parameters to true to match the template downloaded from NGC.
  use_batch_norm: true
  all_projections: true
  freeze_blocks: 0
  freeze_blocks: 1
  input_image_size: "3,224,224"
}
train_config {
  train_dataset_path: "/workspace/tao-experiments/data/split/train"
  val_dataset_path: "/workspace/tao-experiments/data/split/val"
  pretrained_model_path: "/workspace/tao-experiments/classification/pretrained_resnet18/pretrained_classification_vresnet18/resnet_18.hdf5"
  optimizer {
    sgd {
        lr: 0.01
        decay: 0.0
        momentum: 0.9
        nesterov: False
      }
  }
  batch_size_per_gpu: 64
  n_epochs: 80
  n_workers: 16
  preprocess_mode: "caffe"
  enable_random_crop: True
  enable_center_crop: True
  label_smoothing: 0.0
  mixup_alpha: 0.1
  # regularizer
  reg_config {
    type: "L2"
    scope: "Conv2D,Dense"
    weight_decay: 0.00005
  }

  # learning_rate
  lr_config {
    step {
      learning_rate: 0.006
      step_size: 10
      gamma: 0.1
    }
  }
}
eval_config {
  eval_dataset_path: "/workspace/tao-experiments/data/split/test"
  model_path: "/workspace/tao-experiments/classification/output/weights/resnet_080.tlt"
  top_k: 3
  batch_size: 256
  n_workers: 8
  enable_center_crop: True
}

Details for the error when running with ‘–strict-model-config=false’:

Command:

export TRITON_SERVER_IMAGE="nvcr.io/nvidia/tritonserver:22.05-py3"
docker run --gpus 1 --rm \
           --shm-size=1g --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
           -p 8000:8000 -p 8001:8001 -p 8002:8002 \
           -v"$PWD/model_repository":/models \
           $TRITON_SERVER_IMAGE /bin/bash -c "tritonserver --model-repository=/models --strict-model-config=false --grpc-infer-allocation-pool-size=16 --log-verbose=1"

Result:

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 22.05 (build 38317651)
Triton Server Version 2.22.0

Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: CUDA Minor Version Compatibility mode ENABLED.
  Using driver version 510.73.05 which has support for CUDA 11.6.  This container
  was built with CUDA 11.7 and will be run in Minor Version Compatibility mode.
  CUDA Forward Compatibility is preferred over Minor Version Compatibility for use
  with this container but was unavailable:
  [[Forward compatibility was attempted on non supported HW (CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE) cuInit()=804]]
  See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

I0606 00:43:20.225971 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7fc61c000000' with size 268435456
I0606 00:43:20.226203 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0606 00:43:20.226517 1 model_config_utils.cc:645] Server side auto-completed config: name: "clasificador1"
platform: "tensorrt_plan"
default_model_filename: "model.plan"
backend: "tensorrt"

I0606 00:43:20.227006 1 model_repository_manager.cc:1191] loading: clasificador1:1
I0606 00:43:20.327700 1 backend_model.cc:292] Adding default backend config setting: default-max-batch-size,4
I0606 00:43:20.327769 1 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so
I0606 00:43:20.452357 1 tensorrt.cc:5294] TRITONBACKEND_Initialize: tensorrt
I0606 00:43:20.452377 1 tensorrt.cc:5304] Triton TRITONBACKEND API version: 1.9
I0606 00:43:20.452382 1 tensorrt.cc:5310] 'tensorrt' TRITONBACKEND API version: 1.9
I0606 00:43:20.452384 1 tensorrt.cc:5333] Registering TensorRT Plugins
I0606 00:43:20.452392 1 logging.cc:52] Registered plugin creator - ::BatchTilePlugin_TRT version 1
I0606 00:43:20.452395 1 logging.cc:52] Registered plugin creator - ::BatchedNMS_TRT version 1
I0606 00:43:20.452398 1 logging.cc:52] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
I0606 00:43:20.452401 1 logging.cc:52] Registered plugin creator - ::CoordConvAC version 1
I0606 00:43:20.452404 1 logging.cc:52] Registered plugin creator - ::CropAndResize version 1
I0606 00:43:20.452407 1 logging.cc:52] Registered plugin creator - ::CropAndResizeDynamic version 1
I0606 00:43:20.452410 1 logging.cc:52] Registered plugin creator - ::DecodeBbox3DPlugin version 1
I0606 00:43:20.452413 1 logging.cc:52] Registered plugin creator - ::DetectionLayer_TRT version 1
I0606 00:43:20.452416 1 logging.cc:52] Registered plugin creator - ::EfficientNMS_TRT version 1
I0606 00:43:20.452420 1 logging.cc:52] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
I0606 00:43:20.452423 1 logging.cc:52] Registered plugin creator - ::EfficientNMS_Explicit_TF_TRT version 1
I0606 00:43:20.452426 1 logging.cc:52] Registered plugin creator - ::EfficientNMS_Implicit_TF_TRT version 1
I0606 00:43:20.452429 1 logging.cc:52] Registered plugin creator - ::FlattenConcat_TRT version 1
I0606 00:43:20.452432 1 logging.cc:52] Registered plugin creator - ::GenerateDetection_TRT version 1
I0606 00:43:20.452436 1 logging.cc:52] Registered plugin creator - ::GridAnchor_TRT version 1
I0606 00:43:20.452438 1 logging.cc:52] Registered plugin creator - ::GridAnchorRect_TRT version 1
I0606 00:43:20.452574 1 logging.cc:52] Registered plugin creator - ::InstanceNormalization_TRT version 1
I0606 00:43:20.452580 1 logging.cc:52] Registered plugin creator - ::LReLU_TRT version 1
I0606 00:43:20.452583 1 logging.cc:52] Registered plugin creator - ::MultilevelCropAndResize_TRT version 1
I0606 00:43:20.452588 1 logging.cc:52] Registered plugin creator - ::MultilevelProposeROI_TRT version 1
I0606 00:43:20.452591 1 logging.cc:52] Registered plugin creator - ::DMHA version 1
I0606 00:43:20.452594 1 logging.cc:52] Registered plugin creator - ::NMS_TRT version 1
I0606 00:43:20.452596 1 logging.cc:52] Registered plugin creator - ::NMSDynamic_TRT version 1
I0606 00:43:20.452600 1 logging.cc:52] Registered plugin creator - ::Normalize_TRT version 1
I0606 00:43:20.452603 1 logging.cc:52] Registered plugin creator - ::PillarScatterPlugin version 1
I0606 00:43:20.452607 1 logging.cc:52] Registered plugin creator - ::PriorBox_TRT version 1
I0606 00:43:20.452610 1 logging.cc:52] Registered plugin creator - ::ProposalLayer_TRT version 1
I0606 00:43:20.452621 1 logging.cc:52] Registered plugin creator - ::Proposal version 1
I0606 00:43:20.452624 1 logging.cc:52] Registered plugin creator - ::ProposalDynamic version 1
I0606 00:43:20.452628 1 logging.cc:52] Registered plugin creator - ::PyramidROIAlign_TRT version 1
I0606 00:43:20.452631 1 logging.cc:52] Registered plugin creator - ::Region_TRT version 1
I0606 00:43:20.452635 1 logging.cc:52] Registered plugin creator - ::Reorg_TRT version 1
I0606 00:43:20.452638 1 logging.cc:52] Registered plugin creator - ::ResizeNearest_TRT version 1
I0606 00:43:20.452643 1 logging.cc:52] Registered plugin creator - ::RPROI_TRT version 1
I0606 00:43:20.452646 1 logging.cc:52] Registered plugin creator - ::ScatterND version 1
I0606 00:43:20.452653 1 logging.cc:52] Registered plugin creator - ::SpecialSlice_TRT version 1
I0606 00:43:20.452656 1 logging.cc:52] Registered plugin creator - ::Split version 1
I0606 00:43:20.452660 1 logging.cc:52] Registered plugin creator - ::VoxelGeneratorPlugin version 1
I0606 00:43:20.452664 1 tensorrt.cc:5353] backend configuration:
{"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}}
I0606 00:43:20.452713 1 tensorrt.cc:5405] TRITONBACKEND_ModelInitialize: clasificador1 (version 1)
I0606 00:43:20.453274 1 model_config_utils.cc:1597] ModelConfig 64-bit fields:
I0606 00:43:20.453280 1 model_config_utils.cc:1599] 	ModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds
I0606 00:43:20.453281 1 model_config_utils.cc:1599] 	ModelConfig::dynamic_batching::max_queue_delay_microseconds
I0606 00:43:20.453283 1 model_config_utils.cc:1599] 	ModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds
I0606 00:43:20.453284 1 model_config_utils.cc:1599] 	ModelConfig::ensemble_scheduling::step::model_version
I0606 00:43:20.453285 1 model_config_utils.cc:1599] 	ModelConfig::input::dims
I0606 00:43:20.453287 1 model_config_utils.cc:1599] 	ModelConfig::input::reshape::shape
I0606 00:43:20.453288 1 model_config_utils.cc:1599] 	ModelConfig::instance_group::secondary_devices::device_id
I0606 00:43:20.453290 1 model_config_utils.cc:1599] 	ModelConfig::model_warmup::inputs::value::dims
I0606 00:43:20.453291 1 model_config_utils.cc:1599] 	ModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim
I0606 00:43:20.453294 1 model_config_utils.cc:1599] 	ModelConfig::optimization::cuda::graph_spec::input::value::dim
I0606 00:43:20.453295 1 model_config_utils.cc:1599] 	ModelConfig::output::dims
I0606 00:43:20.453296 1 model_config_utils.cc:1599] 	ModelConfig::output::reshape::shape
I0606 00:43:20.453298 1 model_config_utils.cc:1599] 	ModelConfig::sequence_batching::direct::max_queue_delay_microseconds
I0606 00:43:20.453299 1 model_config_utils.cc:1599] 	ModelConfig::sequence_batching::max_sequence_idle_microseconds
I0606 00:43:20.453300 1 model_config_utils.cc:1599] 	ModelConfig::sequence_batching::oldest::max_queue_delay_microseconds
I0606 00:43:20.453302 1 model_config_utils.cc:1599] 	ModelConfig::sequence_batching::state::dims
I0606 00:43:20.453303 1 model_config_utils.cc:1599] 	ModelConfig::sequence_batching::state::initial_state::dims
I0606 00:43:20.453304 1 model_config_utils.cc:1599] 	ModelConfig::version_policy::specific::versions
I0606 00:43:20.909592 1 logging.cc:49] [MemUsageChange] Init CUDA: CPU +464, GPU +0, now: CPU 485, GPU 2821 (MiB)
I0606 00:43:20.917427 1 logging.cc:49] Loaded engine size: 10 MiB
E0606 00:43:20.919569 1 logging.cc:43] 1: [stdArchiveReader.cpp::StdArchiveReader::35] Error Code 1: Serialization (Serialization assertion safeVersionRead == safeSerializationVersion failed.Version tag does not match. Note: Current Version: 0, Serialized Engine Version: 43)
E0606 00:43:20.919583 1 logging.cc:43] 4: [runtime.cpp::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)

Thanks in advance.
Kind regards,

Nicolás

For classification inference in triton, please refer to classification section in GitHub - NVIDIA-AI-IOT/tao-toolkit-triton-apps: Sample app code for deploying TAO Toolkit trained models to Triton
https://github.com/NVIDIA-AI-IOT/tao-toolkit-triton-apps/blob/main/docs/configuring_the_client.md#classification

Refer to TAO unet input and output tensor shapes and order - #3 by Morganh

Yes, you can rename.

It is needed to generate tensorrt engine(i.e., model.plan)
See https://github.com/NVIDIA-AI-IOT/tao-toolkit-triton-apps/blob/main/scripts/download_and_convert.sh#L30

Thanks for the fast and clear answer. Tagged as solution.
Best wishes,

Nicolás

Update. Some additional details in case someone finds them useful.

Regarding the error:

E0606 00:43:20.919569 1 logging.cc:43] 1: [stdArchiveReader.cpp::StdArchiveReader::35] Error Code 1: Serialization (Serialization assertion safeVersionRead == safeSerializationVersion failed.Version tag does not match. Note: Current Version: 0, Serialized Engine Version: 43)
E0606 00:43:20.919583 1 logging.cc:43] 4: [runtime.cpp::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)

It seems that TensorRT version used for exporting needs to be the same as for inference. It is not enough that both are 8.x.

In my case, I was using tao-toolkit 3.22.02 which, when invoking the converter with this command:

tao converter $USER_EXPERIMENT_DIR/export/final_model.etlt \
              -k $KEY \
              -c $USER_EXPERIMENT_DIR/export/final_model_int8_cache.bin \
              -o predictions/Softmax \
              -d 3,224,224 \
              -i nchw \
              -m 64 -t int8 \
              -e $USER_EXPERIMENT_DIR/export/final_model.trt \
              -b 64

spcefifies that nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3 container is being used:

2022-06-05 17:56:21,846 [INFO] root: Registry: ['nvcr.io']
2022-06-05 17:56:21,887 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3

The following command can be used to obtain the TensorRT version in this container.

docker run --rm -it nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3 /bin/bash -c "pip list | grep tensorrt"
tensorrt                      8.0.1.6

So it is required to find a suitable Triton Server compatible with TensorRT 8.0.1.6.

The same applies to inspect the model with polygraphy.

pip install nvidia-tensorrt==8.0.1.6 polygraphy

You can copy your .etlt model and replace the resnet18_vehicletypenet_pruned.etlt.
Then let triton server to generate its tensorrt engine.

1 Like

I tried dropping the .elt instead of the model.plan, but at least [http://nvcr.io/nvidia/tritonserver:21.08-py3) reports an error of being unable to load the model after trying with all backends: ONNX, Tensorflow, TensorRT, etc. Is the .elt backend added in a later version or is it installed by an additional step?

The triton app will generate model.plan(tensorrt engine) based on the .etlt model. See tao-toolkit-triton-apps/download_and_convert.sh at main · NVIDIA-AI-IOT/tao-toolkit-triton-apps · GitHub

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.