RetinaNet trained with taotoolkit cannot be run on the triton server when converting with TensorRT 10.04

remigiusz.pomorski1 · October 28, 2024, 7:37am

Recently, I updated the JetPack on my Orin Nano from JetPack 5 to JetPack 6. After the update, I noticed that a model I had previously running (a RetinaNet model trained with TAO Toolkit 5.3) no longer loads on the Triton server. The Triton server successfully converts the model from an ONNX file to model.plan, but when it tries to load the model from model.plan, it crashes unexpectedly without any additional information.

I tested this model again on the previous JetPack version with TensorRT 8.6, and everything works as expected. Are there any known changes in the new TensorRT version that might cause compatibility issues with TAO Toolkit’s RetinaNet model?

• Hardware: Orin Nano with JetPack 6
• Network Type: RetinaNet trained with Tao Toolkit
• How to reproduce the issue ?: Loading model into the triton server with TensorRT 10.04

Logs:
I1028 07:34:25.875463 1 shared_library.cc:112] “OpenLibraryHandle: /opt/tritonserver/repoagents/trtconverter/libtritonrepoagent_trtconverter.so”
I1028 07:34:25.876578 1 model_config_utils.cc:716] “Server side auto-completed config: "
name: “ball_tracking_v2-internal”
platform: “tensorrt_plan”
input {
name: “Input”
data_type: TYPE_FP32
dims: 3
dims: 160
dims: 320
}
output {
name: “NMS”
data_type: TYPE_FP32
dims: 1
dims: 200
dims: 7
}
output {
name: “NMS_1”
data_type: TYPE_FP32
dims: 1
dims: 1
dims: 1
}
default_model_filename: “model.plan”
model_warmup {
name: “regular sample”
batch_size: 1
inputs {
key: “Input”
value {
data_type: TYPE_FP32
dims: 3
dims: 160
dims: 320
random_data: true
}
}
}
backend: “tensorrt”
model_repository_agents {
agents {
name: “trtconverter”
parameters {
key: “./1/retinanet_balltracking_us_data.onnx”
value: " --fp16”
}
}
}

I1028 07:34:25.876814 1 model_lifecycle.cc:441] “AsyncLoad() ‘ball_tracking_v2-internal’”
I1028 07:34:25.876972 1 model_lifecycle.cc:472] “loading: ball_tracking_v2-internal:1”
I1028 07:34:25.877072 1 model_lifecycle.cc:441] “AsyncLoad() ‘ball_tracking_v2’”
I1028 07:34:25.877215 1 model_lifecycle.cc:472] “loading: ball_tracking_v2:1”
I1028 07:34:25.877338 1 model_lifecycle.cc:551] “CreateModel() ‘ball_tracking_v2-internal’ version 1”
I1028 07:34:25.877658 1 backend_model.cc:503] “Adding default backend config setting: default-max-batch-size,4”
I1028 07:34:25.877772 1 shared_library.cc:112] “OpenLibraryHandle: /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so”
I1028 07:34:25.881588 1 model_lifecycle.cc:551] “CreateModel() ‘ball_tracking_v2’ version 1”
I1028 07:34:25.881814 1 backend_model.cc:503] “Adding default backend config setting: default-max-batch-size,4”
I1028 07:34:25.906358 1 tensorrt.cc:65] “TRITONBACKEND_Initialize: tensorrt”
I1028 07:34:25.906433 1 tensorrt.cc:75] “Triton TRITONBACKEND API version: 1.19”
I1028 07:34:25.906443 1 tensorrt.cc:81] “‘tensorrt’ TRITONBACKEND API version: 1.19”
I1028 07:34:25.906452 1 tensorrt.cc:105] “backend configuration:\n{"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"5.300000","default-max-batch-size":"4"}}”
I1028 07:34:25.906489 1 tensorrt.cc:187] “Registering TensorRT Plugins”
I1028 07:34:25.906536 1 logging.cc:49] “Registered plugin creator - ::BatchedNMSDynamic_TRT version 1”
I1028 07:34:25.906552 1 logging.cc:49] “Registered plugin creator - ::BatchedNMS_TRT version 1”
I1028 07:34:25.906566 1 logging.cc:49] “Registered plugin creator - ::BatchTilePlugin_TRT version 1”
I1028 07:34:25.906583 1 logging.cc:49] “Registered plugin creator - ::Clip_TRT version 1”
I1028 07:34:25.906613 1 logging.cc:49] “Registered plugin creator - ::CoordConvAC version 1”
I1028 07:34:25.906630 1 logging.cc:49] “Registered plugin creator - ::CropAndResizeDynamic version 1”
I1028 07:34:25.906644 1 logging.cc:49] “Registered plugin creator - ::CropAndResize version 1”
I1028 07:34:25.906658 1 logging.cc:49] “Registered plugin creator - ::DecodeBbox3DPlugin version 1”
I1028 07:34:25.906671 1 logging.cc:49] “Registered plugin creator - ::DetectionLayer_TRT version 1”
I1028 07:34:25.906684 1 logging.cc:49] “Registered plugin creator - ::EfficientNMS_Explicit_TF_TRT version 1”
I1028 07:34:25.906696 1 logging.cc:49] “Registered plugin creator - ::EfficientNMS_Implicit_TF_TRT version 1”
I1028 07:34:25.906710 1 logging.cc:49] “Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1”
I1028 07:34:25.906723 1 logging.cc:49] “Registered plugin creator - ::EfficientNMS_TRT version 1”
I1028 07:34:25.906744 1 logging.cc:49] “Registered plugin creator - ::FlattenConcat_TRT version 1”
I1028 07:34:25.906762 1 logging.cc:49] “Registered plugin creator - ::GenerateDetection_TRT version 1”
I1028 07:34:25.906777 1 logging.cc:49] “Registered plugin creator - ::GridAnchor_TRT version 1”
I1028 07:34:25.906789 1 logging.cc:49] “Registered plugin creator - ::GridAnchorRect_TRT version 1”
I1028 07:34:25.906802 1 logging.cc:49] “Registered plugin creator - ::InstanceNormalization_TRT version 1”
I1028 07:34:25.906816 1 logging.cc:49] “Registered plugin creator - ::InstanceNormalization_TRT version 2”
I1028 07:34:25.906831 1 logging.cc:49] “Registered plugin creator - ::InstanceNormalization_TRT version 3”
I1028 07:34:25.906846 1 logging.cc:49] “Registered plugin creator - ::LReLU_TRT version 1”
I1028 07:34:25.906860 1 logging.cc:49] “Registered plugin creator - ::ModulatedDeformConv2d version 1”
I1028 07:34:25.906873 1 logging.cc:49] “Registered plugin creator - ::MultilevelCropAndResize_TRT version 1”
I1028 07:34:25.906888 1 logging.cc:49] “Registered plugin creator - ::MultilevelProposeROI_TRT version 1”
I1028 07:34:25.906900 1 logging.cc:49] “Registered plugin creator - ::MultiscaleDeformableAttnPlugin_TRT version 1”
I1028 07:34:25.906923 1 logging.cc:49] “Registered plugin creator - ::NMSDynamic_TRT version 1”
I1028 07:34:25.906936 1 logging.cc:49] “Registered plugin creator - ::NMS_TRT version 1”
I1028 07:34:25.906948 1 logging.cc:49] “Registered plugin creator - ::Normalize_TRT version 1”
I1028 07:34:25.906960 1 logging.cc:49] “Registered plugin creator - ::PillarScatterPlugin version 1”
I1028 07:34:25.906972 1 logging.cc:49] “Registered plugin creator - ::PriorBox_TRT version 1”
I1028 07:34:25.906984 1 logging.cc:49] “Registered plugin creator - ::ProposalDynamic version 1”
I1028 07:34:25.906995 1 logging.cc:49] “Registered plugin creator - ::ProposalLayer_TRT version 1”
I1028 07:34:25.907010 1 logging.cc:49] “Registered plugin creator - ::Proposal version 1”
I1028 07:34:25.907023 1 logging.cc:49] “Registered plugin creator - ::PyramidROIAlign_TRT version 1”
I1028 07:34:25.907034 1 logging.cc:49] “Registered plugin creator - ::Region_TRT version 1”
I1028 07:34:25.907044 1 logging.cc:49] “Registered plugin creator - ::Reorg_TRT version 2”
I1028 07:34:25.907054 1 logging.cc:49] “Registered plugin creator - ::Reorg_TRT version 1”
I1028 07:34:25.907065 1 logging.cc:49] “Registered plugin creator - ::ResizeNearest_TRT version 1”
I1028 07:34:25.907075 1 logging.cc:49] “Registered plugin creator - ::ROIAlign_TRT version 1”
I1028 07:34:25.907089 1 logging.cc:49] “Registered plugin creator - ::ROIAlign_TRT version 2”
I1028 07:34:25.907103 1 logging.cc:49] “Registered plugin creator - ::RPROI_TRT version 1”
I1028 07:34:25.907116 1 logging.cc:49] “Registered plugin creator - ::ScatterElements version 2”
I1028 07:34:25.907128 1 logging.cc:49] “Registered plugin creator - ::ScatterElements version 1”
I1028 07:34:25.907145 1 logging.cc:49] “Registered plugin creator - ::ScatterND version 1”
I1028 07:34:25.907156 1 logging.cc:49] “Registered plugin creator - ::SpecialSlice_TRT version 1”
I1028 07:34:25.907165 1 logging.cc:49] “Registered plugin creator - ::Split version 1”
I1028 07:34:25.907176 1 logging.cc:49] “Registered plugin creator - ::VoxelGeneratorPlugin version 1”
I1028 07:34:25.907398 1 tensorrt.cc:231] “TRITONBACKEND_ModelInitialize: ball_tracking_v2-internal (version 1)”
I1028 07:34:25.907501 1 shared_library.cc:112] “OpenLibraryHandle: /opt/tritonserver/backends/python/libtriton_python.so”
I1028 07:34:25.908526 1 model_config_utils.cc:1941] “ModelConfig 64-bit fields:”
I1028 07:34:25.908559 1 model_config_utils.cc:1943] “\tModelConfig::dynamic_batching::default_priority_level”
I1028 07:34:25.908566 1 model_config_utils.cc:1943] “\tModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds”
I1028 07:34:25.908572 1 model_config_utils.cc:1943] “\tModelConfig::dynamic_batching::max_queue_delay_microseconds”
I1028 07:34:25.908578 1 model_config_utils.cc:1943] “\tModelConfig::dynamic_batching::priority_levels”
I1028 07:34:25.908583 1 model_config_utils.cc:1943] “\tModelConfig::dynamic_batching::priority_queue_policy::key”
I1028 07:34:25.908588 1 model_config_utils.cc:1943] “\tModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds”
I1028 07:34:25.908594 1 model_config_utils.cc:1943] “\tModelConfig::ensemble_scheduling::step::model_version”
I1028 07:34:25.908599 1 model_config_utils.cc:1943] “\tModelConfig::input::dims”
I1028 07:34:25.908604 1 model_config_utils.cc:1943] “\tModelConfig::input::reshape::shape”
I1028 07:34:25.908610 1 model_config_utils.cc:1943] “\tModelConfig::instance_group::secondary_devices::device_id”
I1028 07:34:25.908615 1 model_config_utils.cc:1943] “\tModelConfig::model_warmup::inputs::value::dims”
I1028 07:34:25.908620 1 model_config_utils.cc:1943] “\tModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim”
I1028 07:34:25.908626 1 model_config_utils.cc:1943] “\tModelConfig::optimization::cuda::graph_spec::input::value::dim”
I1028 07:34:25.908631 1 model_config_utils.cc:1943] “\tModelConfig::output::dims”
I1028 07:34:25.908636 1 model_config_utils.cc:1943] “\tModelConfig::output::reshape::shape”
I1028 07:34:25.908641 1 model_config_utils.cc:1943] “\tModelConfig::sequence_batching::direct::max_queue_delay_microseconds”
I1028 07:34:25.908647 1 model_config_utils.cc:1943] “\tModelConfig::sequence_batching::max_sequence_idle_microseconds”
I1028 07:34:25.908652 1 model_config_utils.cc:1943] “\tModelConfig::sequence_batching::oldest::max_queue_delay_microseconds”
I1028 07:34:25.908657 1 model_config_utils.cc:1943] “\tModelConfig::sequence_batching::state::dims”
I1028 07:34:25.908663 1 model_config_utils.cc:1943] “\tModelConfig::sequence_batching::state::initial_state::dims”
I1028 07:34:25.908668 1 model_config_utils.cc:1943] “\tModelConfig::version_policy::specific::versions”
I1028 07:34:25.908891 1 model_state.cc:317] “Setting the CUDA device to GPU0 to auto-complete config for ball_tracking_v2-internal”
I1028 07:34:25.908988 1 model_state.cc:363] “Using explicit serialized file ‘model.plan’ to auto-complete config for ball_tracking_v2-internal”
I1028 07:34:25.912464 1 python_be.cc:1618] “‘python’ TRITONBACKEND API version: 1.19”
I1028 07:34:25.912516 1 python_be.cc:1640] “backend configuration:\n{"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"5.300000","default-max-batch-size":"4"}}”
I1028 07:34:25.912781 1 python_be.cc:1778] “Shared memory configuration is shm-default-byte-size=1048576,shm-growth-byte-size=1048576,stub-timeout-seconds=30”
I1028 07:34:25.913663 1 python_be.cc:2075] “TRITONBACKEND_GetBackendAttribute: setting attributes”
I1028 07:34:25.913910 1 python_be.cc:1879] “TRITONBACKEND_ModelInitialize: ball_tracking_v2 (version 1)”
I1028 07:34:25.916027 1 stub_launcher.cc:385] “Starting Python backend stub: exec /opt/tritonserver/backends/python/triton_python_backend_stub /models/ball_tracking_v2/1/model.py triton_python_backend_shm_region_ae800121-f883-4f3f-9123-55a4aa4cf3ee 1048576 1048576 1 /opt/tritonserver/backends/python 336 ball_tracking_v2 DEFAULT”
I1028 07:34:25.952927 1 logging.cc:46] “Loaded engine size: 33 MiB”
I1028 07:34:26.002538 1 logging.cc:49] “Local registry did not find NMSDynamic_TRT creator. Will try parent registry if enabled.”
I1028 07:34:26.002605 1 logging.cc:49] “Global registry found NMSDynamic_TRT creator.”

Morganh · October 28, 2024, 9:47am

For Jetpack5 and Jetpack6, they have some differences. For example, Ubuntu version is from 20.04 to 22.04. Also, TensorRT version is changed.
You can run $dkpg -l |grep cuda in Orin Nano to confirm.

So, the model.plan(i.e. , tensorrt engine) is needed to generated under the new TensorRT version.

remigiusz.pomorski1 · October 28, 2024, 10:02am

Okay so maybe some more background.

First I train the RetinaNet model with Tao Toolkit
I export the onnx model using tao model retinanet export -m /results/run1/weights/retinanet_resnet18_epoch_100.hdf5 -e /specs/retinanet_train_resnet18.txt
I’m generating the model.plan from my onnx model on the host machine
I load the model.plan model to triton server

These are the exact steps I followed with JetPack 5, and everything worked perfectly. However, after updating to JetPack 6, I can no longer load the model into the Triton server (see logs above). The model.plan file is now also generated with the updated TensorRT version.

Could there be significant differences between TensorRT 5 and 6 that might be causing this issue?

Morganh · October 29, 2024, 7:56am

If your host machine has the same TensorRT version as triton server, there is no issue.
Suggest you to generate model.plan where you are going to run inference. I think it is triton server. You can login the triton server and generate tensorrt engine.

remigiusz.pomorski1 · October 29, 2024, 8:06am

I’m generating model.plan on the host machine using triton server on which it should be running, so this is not it unfortunately.

Morganh · October 29, 2024, 8:37am

Suggest you to docker run into this 10.04 docker and generate tensorrt engine inside it.

remigiusz.pomorski1 · October 29, 2024, 8:48am

Yes, I understand. I’m already doing this. I was trying to automate it using agents and also tried manually. The TensorRT engine is generating the model plan, but Triton cannot load it (I’m getting the logs I mentioned in the topic, and that’s all the information I have). I also want to add that this procedure was working with the older version of Jetpack with TensorRT 8.6. Additionally, I’ve just tested this process with EfficientDet (a TensorFlow 2 model), and it’s also working. Does TensorRT 10.04 support TensorFlow 1 models (such as RetinaNet)?

Morganh · October 29, 2024, 9:05am

For TAO-tf1 models, the latest verified version for TensorRT is 8.6.3. You can find this info in 5.5 TAO-deploy docker. For TensorRT 10.04, the status is unknown, not sure if there is any issue.
Thus, you may check if it is possible to run a triton server which is based on TensorRT8.6.

remigiusz.pomorski1 · October 29, 2024, 9:42am

Okay so looking on our discussion and problems I have I assume that for the new TensorRT 10.04 the old models from Tensorflow 1 are not supported in triton server, because I cannot see any other difference from the process perspective.

Morganh · October 29, 2024, 9:51am

TAO-deploy docker will support TensorRT 10 in future release. So, the old models should work in TensorRT 10.

remigiusz.pomorski1 · October 29, 2024, 10:03am

Super, thank you for clarifying that. I will be waiting for the updates on this matter.

Morganh · November 4, 2024, 6:32am

BTW, the official triton server for TAO is shared in GitHub - NVIDIA-AI-IOT/tao-toolkit-triton-apps: Sample app code for deploying TAO Toolkit trained models to Triton. It can run inference against the Retinanet engines.
Currently, it is based on nvcr.io/nvidia/tritonserver:23.02-py3 and its TRT version is 8.x. You can find the docker file in tao-toolkit-triton-apps/docker/Dockerfile at main · NVIDIA-AI-IOT/tao-toolkit-triton-apps · GitHub.
If you have bandwidth, you can update base docker to Tensorrt10 and also update other packages for debug use.

yingliu · December 5, 2024, 2:29am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

system · December 19, 2024, 2:30am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Generation of Triton Inference Server configuration for TensorRT exported model of TAO classification (resnet) TAO Toolkit tensorrt , inference-server-triton , tao	7	2580	June 23, 2022
`Error No Op registered for NMSDynamic_TRT...` when trying to run Trition inference server with a SSD model TAO Toolkit jetson	12	1176	October 12, 2023
Process killed during tensorrt conversion on Jetson orin NX (8 GB) Jetson Orin NX tensorrt	15	669	April 30, 2024
DeepStream, Tensorflow Model Zoo - Incompatibility DeepStream SDK	13	1484	October 12, 2021
I do not get any performance improvement after using TensorRT provider for object detection model Jetson Nano tensorrt , onnx	7	1383	July 12, 2022
[TensorRT] ERROR: Network must have at least one output TensorRT tensorrt	29	2304	September 30, 2021
[TensorRT] ERROR: Network must have at least one output [TensorRT] ERROR: Network validation failed TensorRT tensorrt , cuda , onnx	10	2755	October 16, 2020
Running a pytorch network converted to ONNX with TensorRT on the TX2 Jetson TX2	24	8815	October 18, 2021
Trying to run TensorFlow 1.15 produced graphdefs with TF2 based tensorRT but TensorRT model is not building correctly TensorRT	6	983	July 15, 2021
Convert tensorrt engine from version 7 to 8 TAO Toolkit tensorrt	67	4353	October 12, 2021

RetinaNet trained with taotoolkit cannot be run on the triton server when converting with TensorRT 10.04

Related topics