Hello,
I’m following the instructions from image classification tutorial with Resnet from the NVIDIA TAO CV samples on a custom dataset of six classes. The notebook and the site documentation describe how to export and deploy to Deepstream SDK but not how to export and deploy to Triton Inference Server.
The Triton documentation specifies that TensorRT models for Triton Inference Server must have this format:
<model-repository-path>/
<model-name>/
config.pbtxt
1/
model.plan
But after the procedure is completely executed the output consist of the following files:
!ls $TAO_EXPERIMENTS_DIR/classification/export
calibration.tensor
final_model_int8_cache.bin
final_model.etlt
final_model.trt
Questions:
- How is
config.pbtxt
generated? - Is there an automatic tool or a tao tool to inspect an optimized model architecture to obtain the last layer name, format, etc.? Trying with option ‘–strict-model-config=false’ gives an error (details below).
- Is it correct to copy and rename
final_model.trt
tomodel.plan
to the Triton model repository? - Does Triton Server support
final_model.etlt
? - Any reference to documentation or tutorial is appreciated.
Details:
• Hardware: NVIDIA RTX3070MaxQ.
• Network Type (Classification)
• TLT Version: 3.22.02
• Triton Server Version: 22.05
• Training spec file:
model_config {
arch: "resnet",
n_layers: 18
# Setting these parameters to true to match the template downloaded from NGC.
use_batch_norm: true
all_projections: true
freeze_blocks: 0
freeze_blocks: 1
input_image_size: "3,224,224"
}
train_config {
train_dataset_path: "/workspace/tao-experiments/data/split/train"
val_dataset_path: "/workspace/tao-experiments/data/split/val"
pretrained_model_path: "/workspace/tao-experiments/classification/pretrained_resnet18/pretrained_classification_vresnet18/resnet_18.hdf5"
optimizer {
sgd {
lr: 0.01
decay: 0.0
momentum: 0.9
nesterov: False
}
}
batch_size_per_gpu: 64
n_epochs: 80
n_workers: 16
preprocess_mode: "caffe"
enable_random_crop: True
enable_center_crop: True
label_smoothing: 0.0
mixup_alpha: 0.1
# regularizer
reg_config {
type: "L2"
scope: "Conv2D,Dense"
weight_decay: 0.00005
}
# learning_rate
lr_config {
step {
learning_rate: 0.006
step_size: 10
gamma: 0.1
}
}
}
eval_config {
eval_dataset_path: "/workspace/tao-experiments/data/split/test"
model_path: "/workspace/tao-experiments/classification/output/weights/resnet_080.tlt"
top_k: 3
batch_size: 256
n_workers: 8
enable_center_crop: True
}
Details for the error when running with ‘–strict-model-config=false’:
Command:
export TRITON_SERVER_IMAGE="nvcr.io/nvidia/tritonserver:22.05-py3"
docker run --gpus 1 --rm \
--shm-size=1g --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
-p 8000:8000 -p 8001:8001 -p 8002:8002 \
-v"$PWD/model_repository":/models \
$TRITON_SERVER_IMAGE /bin/bash -c "tritonserver --model-repository=/models --strict-model-config=false --grpc-infer-allocation-pool-size=16 --log-verbose=1"
Result:
=============================
== Triton Inference Server ==
=============================
NVIDIA Release 22.05 (build 38317651)
Triton Server Version 2.22.0
Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
WARNING: CUDA Minor Version Compatibility mode ENABLED.
Using driver version 510.73.05 which has support for CUDA 11.6. This container
was built with CUDA 11.7 and will be run in Minor Version Compatibility mode.
CUDA Forward Compatibility is preferred over Minor Version Compatibility for use
with this container but was unavailable:
[[Forward compatibility was attempted on non supported HW (CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE) cuInit()=804]]
See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.
I0606 00:43:20.225971 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7fc61c000000' with size 268435456
I0606 00:43:20.226203 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0606 00:43:20.226517 1 model_config_utils.cc:645] Server side auto-completed config: name: "clasificador1"
platform: "tensorrt_plan"
default_model_filename: "model.plan"
backend: "tensorrt"
I0606 00:43:20.227006 1 model_repository_manager.cc:1191] loading: clasificador1:1
I0606 00:43:20.327700 1 backend_model.cc:292] Adding default backend config setting: default-max-batch-size,4
I0606 00:43:20.327769 1 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so
I0606 00:43:20.452357 1 tensorrt.cc:5294] TRITONBACKEND_Initialize: tensorrt
I0606 00:43:20.452377 1 tensorrt.cc:5304] Triton TRITONBACKEND API version: 1.9
I0606 00:43:20.452382 1 tensorrt.cc:5310] 'tensorrt' TRITONBACKEND API version: 1.9
I0606 00:43:20.452384 1 tensorrt.cc:5333] Registering TensorRT Plugins
I0606 00:43:20.452392 1 logging.cc:52] Registered plugin creator - ::BatchTilePlugin_TRT version 1
I0606 00:43:20.452395 1 logging.cc:52] Registered plugin creator - ::BatchedNMS_TRT version 1
I0606 00:43:20.452398 1 logging.cc:52] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
I0606 00:43:20.452401 1 logging.cc:52] Registered plugin creator - ::CoordConvAC version 1
I0606 00:43:20.452404 1 logging.cc:52] Registered plugin creator - ::CropAndResize version 1
I0606 00:43:20.452407 1 logging.cc:52] Registered plugin creator - ::CropAndResizeDynamic version 1
I0606 00:43:20.452410 1 logging.cc:52] Registered plugin creator - ::DecodeBbox3DPlugin version 1
I0606 00:43:20.452413 1 logging.cc:52] Registered plugin creator - ::DetectionLayer_TRT version 1
I0606 00:43:20.452416 1 logging.cc:52] Registered plugin creator - ::EfficientNMS_TRT version 1
I0606 00:43:20.452420 1 logging.cc:52] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
I0606 00:43:20.452423 1 logging.cc:52] Registered plugin creator - ::EfficientNMS_Explicit_TF_TRT version 1
I0606 00:43:20.452426 1 logging.cc:52] Registered plugin creator - ::EfficientNMS_Implicit_TF_TRT version 1
I0606 00:43:20.452429 1 logging.cc:52] Registered plugin creator - ::FlattenConcat_TRT version 1
I0606 00:43:20.452432 1 logging.cc:52] Registered plugin creator - ::GenerateDetection_TRT version 1
I0606 00:43:20.452436 1 logging.cc:52] Registered plugin creator - ::GridAnchor_TRT version 1
I0606 00:43:20.452438 1 logging.cc:52] Registered plugin creator - ::GridAnchorRect_TRT version 1
I0606 00:43:20.452574 1 logging.cc:52] Registered plugin creator - ::InstanceNormalization_TRT version 1
I0606 00:43:20.452580 1 logging.cc:52] Registered plugin creator - ::LReLU_TRT version 1
I0606 00:43:20.452583 1 logging.cc:52] Registered plugin creator - ::MultilevelCropAndResize_TRT version 1
I0606 00:43:20.452588 1 logging.cc:52] Registered plugin creator - ::MultilevelProposeROI_TRT version 1
I0606 00:43:20.452591 1 logging.cc:52] Registered plugin creator - ::DMHA version 1
I0606 00:43:20.452594 1 logging.cc:52] Registered plugin creator - ::NMS_TRT version 1
I0606 00:43:20.452596 1 logging.cc:52] Registered plugin creator - ::NMSDynamic_TRT version 1
I0606 00:43:20.452600 1 logging.cc:52] Registered plugin creator - ::Normalize_TRT version 1
I0606 00:43:20.452603 1 logging.cc:52] Registered plugin creator - ::PillarScatterPlugin version 1
I0606 00:43:20.452607 1 logging.cc:52] Registered plugin creator - ::PriorBox_TRT version 1
I0606 00:43:20.452610 1 logging.cc:52] Registered plugin creator - ::ProposalLayer_TRT version 1
I0606 00:43:20.452621 1 logging.cc:52] Registered plugin creator - ::Proposal version 1
I0606 00:43:20.452624 1 logging.cc:52] Registered plugin creator - ::ProposalDynamic version 1
I0606 00:43:20.452628 1 logging.cc:52] Registered plugin creator - ::PyramidROIAlign_TRT version 1
I0606 00:43:20.452631 1 logging.cc:52] Registered plugin creator - ::Region_TRT version 1
I0606 00:43:20.452635 1 logging.cc:52] Registered plugin creator - ::Reorg_TRT version 1
I0606 00:43:20.452638 1 logging.cc:52] Registered plugin creator - ::ResizeNearest_TRT version 1
I0606 00:43:20.452643 1 logging.cc:52] Registered plugin creator - ::RPROI_TRT version 1
I0606 00:43:20.452646 1 logging.cc:52] Registered plugin creator - ::ScatterND version 1
I0606 00:43:20.452653 1 logging.cc:52] Registered plugin creator - ::SpecialSlice_TRT version 1
I0606 00:43:20.452656 1 logging.cc:52] Registered plugin creator - ::Split version 1
I0606 00:43:20.452660 1 logging.cc:52] Registered plugin creator - ::VoxelGeneratorPlugin version 1
I0606 00:43:20.452664 1 tensorrt.cc:5353] backend configuration:
{"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}}
I0606 00:43:20.452713 1 tensorrt.cc:5405] TRITONBACKEND_ModelInitialize: clasificador1 (version 1)
I0606 00:43:20.453274 1 model_config_utils.cc:1597] ModelConfig 64-bit fields:
I0606 00:43:20.453280 1 model_config_utils.cc:1599] ModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds
I0606 00:43:20.453281 1 model_config_utils.cc:1599] ModelConfig::dynamic_batching::max_queue_delay_microseconds
I0606 00:43:20.453283 1 model_config_utils.cc:1599] ModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds
I0606 00:43:20.453284 1 model_config_utils.cc:1599] ModelConfig::ensemble_scheduling::step::model_version
I0606 00:43:20.453285 1 model_config_utils.cc:1599] ModelConfig::input::dims
I0606 00:43:20.453287 1 model_config_utils.cc:1599] ModelConfig::input::reshape::shape
I0606 00:43:20.453288 1 model_config_utils.cc:1599] ModelConfig::instance_group::secondary_devices::device_id
I0606 00:43:20.453290 1 model_config_utils.cc:1599] ModelConfig::model_warmup::inputs::value::dims
I0606 00:43:20.453291 1 model_config_utils.cc:1599] ModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim
I0606 00:43:20.453294 1 model_config_utils.cc:1599] ModelConfig::optimization::cuda::graph_spec::input::value::dim
I0606 00:43:20.453295 1 model_config_utils.cc:1599] ModelConfig::output::dims
I0606 00:43:20.453296 1 model_config_utils.cc:1599] ModelConfig::output::reshape::shape
I0606 00:43:20.453298 1 model_config_utils.cc:1599] ModelConfig::sequence_batching::direct::max_queue_delay_microseconds
I0606 00:43:20.453299 1 model_config_utils.cc:1599] ModelConfig::sequence_batching::max_sequence_idle_microseconds
I0606 00:43:20.453300 1 model_config_utils.cc:1599] ModelConfig::sequence_batching::oldest::max_queue_delay_microseconds
I0606 00:43:20.453302 1 model_config_utils.cc:1599] ModelConfig::sequence_batching::state::dims
I0606 00:43:20.453303 1 model_config_utils.cc:1599] ModelConfig::sequence_batching::state::initial_state::dims
I0606 00:43:20.453304 1 model_config_utils.cc:1599] ModelConfig::version_policy::specific::versions
I0606 00:43:20.909592 1 logging.cc:49] [MemUsageChange] Init CUDA: CPU +464, GPU +0, now: CPU 485, GPU 2821 (MiB)
I0606 00:43:20.917427 1 logging.cc:49] Loaded engine size: 10 MiB
E0606 00:43:20.919569 1 logging.cc:43] 1: [stdArchiveReader.cpp::StdArchiveReader::35] Error Code 1: Serialization (Serialization assertion safeVersionRead == safeSerializationVersion failed.Version tag does not match. Note: Current Version: 0, Serialized Engine Version: 43)
E0606 00:43:20.919583 1 logging.cc:43] 4: [runtime.cpp::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)
Thanks in advance.
Kind regards,
Nicolás