Triton Server Error with TAO FasterRCNN model: Validation failed: libNamespace == nullptr

grant32 · February 10, 2025, 11:10pm

Please provide the following information when requesting support.

• Hardware: Ubuntu 22.04 RTX 4090
• Network Type: FasterRCNN TAO model
• TAO version: 5.5.0
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

Training spec:

# Copyright (c) 2017-2020, NVIDIA CORPORATION.  All rights reserved.
random_seed: 42

verbose: True
model_config {
input_image_config {
image_type: RGB
image_channel_order: 'bgr'
size_height_width {
height: 640
width: 640
}
    image_channel_mean {
        key: 'b'
        value: 103.939
}
    image_channel_mean {
        key: 'g'
        value: 116.779
}
    image_channel_mean {
        key: 'r'
        value: 123.68
}
image_scaling_factor: 1.0
max_objects_num_per_image: 100
}
arch: "resnet:18"
anchor_box_config {
scale: 64.0
scale: 128.0
scale: 256.0
ratio: 1.0
ratio: 0.5
ratio: 2.0
}
freeze_bn: True
freeze_blocks: 0
freeze_blocks: 1
roi_mini_batch: 256
rpn_stride: 16
use_bias: False
roi_pooling_config {
pool_size: 7
pool_size_2x: False
}
all_projections: True
use_pooling:False
}
dataset_config {
  data_sources: {
    tfrecords_path: "/workspace/tao-experiments/data/faster_rcnn/tfrecords/new_trainval/new_trainval*"
    image_directory_path: "/workspace/tao-experiments/data/training"
  }
image_extension: 'png'
target_class_mapping {
key: 'item'
value: 'item'
}
target_class_mapping {
key: 'person'
value: 'person'
}
validation_fold: 0
}
augmentation_config {
preprocessing {
output_image_width: 640
output_image_height: 640
output_image_channel: 3
min_bbox_width: 1.0
min_bbox_height: 1.0
enable_auto_resize: True
}
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 0
translate_max_y: 0
}
color_augmentation {
hue_rotation_max: 0.0
saturation_shift_max: 0.0
contrast_scale_max: 0.0
contrast_center: 0.5
}
}
training_config {
visualizer {
    enabled: False
    num_images: 3
}
enable_augmentation: True
enable_qat: False
batch_size_per_gpu: 8
num_epochs: 12
rpn_min_overlap: 0.3
rpn_max_overlap: 0.7
classifier_min_overlap: 0.0
classifier_max_overlap: 0.5
gt_as_roi: False
std_scaling: 1.0
classifier_regr_std {
key: 'x'
value: 10.0
}
classifier_regr_std {
key: 'y'
value: 10.0
}
classifier_regr_std {
key: 'w'
value: 5.0
}
classifier_regr_std {
key: 'h'
value: 5.0
}

rpn_mini_batch: 256
rpn_pre_nms_top_N: 12000
rpn_nms_max_boxes: 2000
rpn_nms_overlap_threshold: 0.7

regularizer {
type: L2
weight: 1e-4
}

optimizer {
sgd {
lr: 0.02
momentum: 0.9
decay: 0.0
nesterov: False
}
}

learning_rate {
soft_start {
base_lr: 0.02
start_lr: 0.002
soft_start: 0.1
annealing_points: 0.8
annealing_points: 0.9
annealing_divider: 10.0
}
}

lambda_rpn_regr: 1.0
lambda_rpn_class: 1.0
lambda_cls_regr: 1.0
lambda_cls_class: 1.0
}
inference_config {
images_dir: '/workspace/tao-experiments/data/test_samples'
batch_size: 1
detection_image_output_dir: '/workspace/tao-experiments/faster_rcnn/inference_results_imgs_retrain'
labels_dump_dir: '/workspace/tao-experiments/faster_rcnn/inference_dump_labels_retrain'
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
object_confidence_thres: 0.0001
bbox_visualize_threshold: 0.6
classifier_nms_max_boxes: 100
classifier_nms_overlap_threshold: 0.3
nms_score_bits: 8
}
evaluation_config {
batch_size: 1
validation_period_during_training: 1
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
classifier_nms_max_boxes: 100
classifier_nms_overlap_threshold: 0.3
object_confidence_thres: 0.0001
use_voc07_11point_metric:False
gt_matching_iou_threshold: 0.5
}

Hello, I am having issues using transfer learning with the TAO FasterRCNN model, or more specifically with the Triton Inference Server after exporting as a TRT engine. I trained following the guidelines in the following notebook:

github.com/NVIDIA/tao_tutorials

notebooks/tao_launcher_starter_kit/faster_rcnn/faster_rcnn.ipynb

main

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Object Detection using TAO FasterRCNN\n",
    "\n",
    "Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. \n",
    "\n",
    "Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.\n",
    "\n",
    "<img align=\"center\" src=\"https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png\" width=\"1080\"> "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},

This file has been truncated. show original

Training was successful and inference looked normal. However, when doing inference, I was receiving the error:

[02/10/2025-16:19:07] [TRT] [F] Validation failed: libNamespace == nullptr 
/workspace/trt_oss_src/TensorRT/plugin/proposalPlugin/proposalPlugin.cpp:528 
 [02/10/2025-16:19:07] [TRT] [E] std::exception

Note: I also received this error without any custom data and just the tutorial data, so to reproduce, you can use the tutorial data or I can send the tutorial model.

This error caused no issues with the inference using the TAO CLI. But when I attempted to launch a Triton Server instance with this model to test inference times, the server crashed due to this error. Is there a way to cause the server to ignore this validation issue or to fix this error with the model?

Do note this is a listed limitation with the TAO Toolkit 5.2.0 in the release notes of 5.3.0 as listed in the below link:
https://docs.nvidia.com/tao/archive/5.3.0/text/release_notes.html

Also, I used Triton Server version 24.04 as it is the last with TensorRT 8, as the TAO toolkit does not currently support TRT 10 yet from what I can see. Here is the line used to launch the triton server:

docker run --gpus=1 --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /home/ubuntu-testing/model_repository:/models nvcr.io/nvidia/tritonserver:24.04-py3 tritonserver --model-repository=/models

Here is my server config file for the model. I am not certain the output shapes are correct, but from what I can see it is not even getting to the config file before the server stops.

name: "FRCNN-resnet50"
platform: "tensorrt_plan"
max_batch_size : 0
input [
  {
    name: "input_image"
    data_type: TYPE_FP16
    dims: [ 3, 640, 640 ]
    reshape { shape: [ 1, 3, 640, 640 ] }
  }
]
output [
  {
    name: "nms_out"
    data_type: TYPE_FP32
    dims: [ 1, 1, 100, 7 ]
    reshape { shape: [ 1, 1, 100, 7 ] }
  },
  {
    name: "nms_out_1"
    data_type: TYPE_FP32
    dims: [ 1, 1 , 1, 1]
    reshape { shape: [ 1, 1, 1, 1 ] }
  }
]

And here is the output from the Triton server when it does not launch:

NVIDIA Release 24.04 (build 90085237)
Triton Server Version 2.45.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

I0210 23:05:46.603013 1 pinned_memory_manager.cc:275] Pinned memory pool is created at '0x7cb6d6000000' with size 268435456
I0210 23:05:46.604848 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I0210 23:05:46.608765 1 model_lifecycle.cc:469] loading: FRCNN-resnet50:1
I0210 23:05:46.634964 1 tensorrt.cc:65] TRITONBACKEND_Initialize: tensorrt
I0210 23:05:46.634975 1 tensorrt.cc:75] Triton TRITONBACKEND API version: 1.19
I0210 23:05:46.634977 1 tensorrt.cc:81] 'tensorrt' TRITONBACKEND API version: 1.19
I0210 23:05:46.634979 1 tensorrt.cc:105] backend configuration:
{"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}}
I0210 23:05:46.636848 1 tensorrt.cc:231] TRITONBACKEND_ModelInitialize: FRCNN-resnet50 (version 1)
I0210 23:05:46.691943 1 logging.cc:46] Loaded engine size: 84 MiB
E0210 23:05:46.707516 1 logging.cc:40] Validation failed: libNamespace == nullptr
plugin/proposalPlugin/proposalPlugin.cpp:528

Thanks for your help!

Morganh · February 11, 2025, 9:11am

grant32:

[02/10/2025-16:19:07] [TRT] [F] Validation failed: libNamespace == nullptr 
/workspace/trt_oss_src/TensorRT/plugin/proposalPlugin/proposalPlugin.cpp:528 
 [02/10/2025-16:19:07] [TRT] [E] std::exception

It is a bug from TensorRT plugin code. Please sync latest TensorRT plugin code.

grant32 · February 11, 2025, 3:57pm

How would I do this? Did you mean on the triton server side or on the TAO image before the TRT engine is generated? Also, by syncing the plugin code do you mean updating the TRT version or something else?

grant32 · February 11, 2025, 5:42pm

Looks like I have been able to solve this by creating the trt engine with trtexec in TensorRT 10.8. The error no longer appears, but my server is still not starting. I have updated to Triton server version 25.01. Looks like maybe something else was the cause of the error. There are no failure messages though:

Command:

docker run --gpus=1 --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /home/ubuntu-testing/model_repository:/models nvcr.io/nvidia/tritonserver:25.01-py3 tritonserver --model-repository=/models --log-verbose 1

Output:

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 25.01 (build 136230209)
Triton Server Version 2.54.0

Copyright (c) 2018-2024, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

I0211 17:34:51.850449 1 cache_manager.cc:480] "Create CacheManager with cache_dir: '/opt/tritonserver/caches'"
I0211 17:34:51.986746 1 pinned_memory_manager.cc:277] "Pinned memory pool is created at '0x7a4f1a000000' with size 268435456"
I0211 17:34:51.988590 1 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 0 with size 67108864"
I0211 17:34:51.992076 1 model_config_utils.cc:753] "Server side auto-completed config: "
name: "FRCNN-resnet50"
platform: "tensorrt_plan"
max_batch_size: 1
input {
  name: "input_image"
  data_type: TYPE_FP32
  dims: 1
  dims: 3
  dims: 640
  dims: 640
  reshape {
    shape: 1
    shape: 3
    shape: 640
    shape: 640
  }
}
output {
  name: "nms_out"
  data_type: TYPE_FP32
  dims: 1
  dims: 1
  dims: 100
  dims: 7
}
output {
  name: "nms_out_1"
  data_type: TYPE_FP32
  dims: 1
  dims: 1
  dims: 1
  dims: 1
}
default_model_filename: "model.plan"
backend: "tensorrt"

I0211 17:34:51.992110 1 model_lifecycle.cc:473] "loading: FRCNN-resnet50:1"
I0211 17:34:51.992182 1 backend_model.cc:503] "Adding default backend config setting: default-max-batch-size,4"
I0211 17:34:51.992201 1 shared_library.cc:113] "OpenLibraryHandle: /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so"
I0211 17:34:52.018443 1 tensorrt.cc:65] "TRITONBACKEND_Initialize: tensorrt"
I0211 17:34:52.018457 1 tensorrt.cc:75] "Triton TRITONBACKEND API version: 1.19"
I0211 17:34:52.018459 1 tensorrt.cc:81] "'tensorrt' TRITONBACKEND API version: 1.19"
I0211 17:34:52.018461 1 tensorrt.cc:105] "backend configuration:\n{\"cmdline\":{\"auto-complete-config\":\"true\",\"backend-directory\":\"/opt/tritonserver/backends\",\"min-compute-capability\":\"6.000000\",\"default-max-batch-size\":\"4\"}}"
I0211 17:34:52.018470 1 tensorrt.cc:187] "Registering TensorRT Plugins"
I0211 17:34:52.018479 1 logging.cc:49] "Registered plugin creator - ::ROIAlign_TRT version 2"
I0211 17:34:52.018485 1 logging.cc:49] "Registered plugin creator - ::BatchedNMSDynamic_TRT version 1"
I0211 17:34:52.018487 1 logging.cc:49] "Registered plugin creator - ::BatchedNMS_TRT version 1"
I0211 17:34:52.018489 1 logging.cc:49] "Registered plugin creator - ::BatchTilePlugin_TRT version 1"
I0211 17:34:52.018492 1 logging.cc:49] "Registered plugin creator - ::Clip_TRT version 1"
I0211 17:34:52.018501 1 logging.cc:49] "Registered plugin creator - ::CoordConvAC version 1"
I0211 17:34:52.018504 1 logging.cc:49] "Registered plugin creator - ::CropAndResizeDynamic version 1"
I0211 17:34:52.018506 1 logging.cc:49] "Registered plugin creator - ::CropAndResize version 1"
I0211 17:34:52.018509 1 logging.cc:49] "Registered plugin creator - ::DecodeBbox3DPlugin version 1"
I0211 17:34:52.018511 1 logging.cc:49] "Registered plugin creator - ::DetectionLayer_TRT version 1"
I0211 17:34:52.018515 1 logging.cc:49] "Registered plugin creator - ::EfficientNMS_Explicit_TF_TRT version 1"
I0211 17:34:52.018518 1 logging.cc:49] "Registered plugin creator - ::EfficientNMS_Implicit_TF_TRT version 1"
I0211 17:34:52.018521 1 logging.cc:49] "Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1"
I0211 17:34:52.018524 1 logging.cc:49] "Registered plugin creator - ::EfficientNMS_TRT version 1"
I0211 17:34:52.018528 1 logging.cc:49] "Registered plugin creator - ::FlattenConcat_TRT version 1"
I0211 17:34:52.018531 1 logging.cc:49] "Registered plugin creator - ::GenerateDetection_TRT version 1"
I0211 17:34:52.018534 1 logging.cc:49] "Registered plugin creator - ::GridAnchor_TRT version 1"
I0211 17:34:52.018536 1 logging.cc:49] "Registered plugin creator - ::GridAnchorRect_TRT version 1"
I0211 17:34:52.018540 1 logging.cc:49] "Registered plugin creator - ::InstanceNormalization_TRT version 1"
I0211 17:34:52.018545 1 logging.cc:49] "Registered plugin creator - ::InstanceNormalization_TRT version 2"
I0211 17:34:52.018551 1 logging.cc:49] "Registered plugin creator - ::InstanceNormalization_TRT version 3"
I0211 17:34:52.018553 1 logging.cc:49] "Registered plugin creator - ::LReLU_TRT version 1"
I0211 17:34:52.018556 1 logging.cc:49] "Registered plugin creator - ::ModulatedDeformConv2d version 1"
I0211 17:34:52.018559 1 logging.cc:49] "Registered plugin creator - ::MultilevelCropAndResize_TRT version 1"
I0211 17:34:52.018564 1 logging.cc:49] "Registered plugin creator - ::MultilevelProposeROI_TRT version 1"
I0211 17:34:52.018567 1 logging.cc:49] "Registered plugin creator - ::MultiscaleDeformableAttnPlugin_TRT version 1"
I0211 17:34:52.018570 1 logging.cc:49] "Registered plugin creator - ::NMSDynamic_TRT version 1"
I0211 17:34:52.018573 1 logging.cc:49] "Registered plugin creator - ::NMS_TRT version 1"
I0211 17:34:52.018575 1 logging.cc:49] "Registered plugin creator - ::Normalize_TRT version 1"
I0211 17:34:52.018578 1 logging.cc:49] "Registered plugin creator - ::PillarScatterPlugin version 1"
I0211 17:34:52.018581 1 logging.cc:49] "Registered plugin creator - ::PriorBox_TRT version 1"
I0211 17:34:52.018584 1 logging.cc:49] "Registered plugin creator - ::ProposalDynamic version 1"
I0211 17:34:52.018586 1 logging.cc:49] "Registered plugin creator - ::ProposalLayer_TRT version 1"
I0211 17:34:52.018589 1 logging.cc:49] "Registered plugin creator - ::Proposal version 1"
I0211 17:34:52.018592 1 logging.cc:49] "Registered plugin creator - ::PyramidROIAlign_TRT version 1"
I0211 17:34:52.018594 1 logging.cc:49] "Registered plugin creator - ::Region_TRT version 1"
I0211 17:34:52.018598 1 logging.cc:49] "Registered plugin creator - ::Reorg_TRT version 2"
I0211 17:34:52.018600 1 logging.cc:49] "Registered plugin creator - ::Reorg_TRT version 1"
I0211 17:34:52.018602 1 logging.cc:49] "Registered plugin creator - ::ResizeNearest_TRT version 1"
I0211 17:34:52.018605 1 logging.cc:49] "Registered plugin creator - ::ROIAlign_TRT version 1"
I0211 17:34:52.018607 1 logging.cc:49] "Registered plugin creator - ::RPROI_TRT version 1"
I0211 17:34:52.018610 1 logging.cc:49] "Registered plugin creator - ::ScatterElements version 1"
I0211 17:34:52.018613 1 logging.cc:49] "Registered plugin creator - ::ScatterElements version 2"
I0211 17:34:52.018616 1 logging.cc:49] "Registered plugin creator - ::ScatterND version 1"
I0211 17:34:52.018621 1 logging.cc:49] "Registered plugin creator - ::SpecialSlice_TRT version 1"
I0211 17:34:52.018625 1 logging.cc:49] "Registered plugin creator - ::Split version 1"
I0211 17:34:52.018628 1 logging.cc:49] "Registered plugin creator - ::VoxelGeneratorPlugin version 1"
I0211 17:34:52.020409 1 tensorrt.cc:231] "TRITONBACKEND_ModelInitialize: FRCNN-resnet50 (version 1)"
I0211 17:34:52.020642 1 model_config_utils.cc:1986] "ModelConfig 64-bit fields:"
I0211 17:34:52.020645 1 model_config_utils.cc:1988] "\tModelConfig::dynamic_batching::default_priority_level"
I0211 17:34:52.020646 1 model_config_utils.cc:1988] "\tModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds"
I0211 17:34:52.020648 1 model_config_utils.cc:1988] "\tModelConfig::dynamic_batching::max_queue_delay_microseconds"
I0211 17:34:52.020649 1 model_config_utils.cc:1988] "\tModelConfig::dynamic_batching::priority_levels"
I0211 17:34:52.020651 1 model_config_utils.cc:1988] "\tModelConfig::dynamic_batching::priority_queue_policy::key"
I0211 17:34:52.020652 1 model_config_utils.cc:1988] "\tModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds"
I0211 17:34:52.020653 1 model_config_utils.cc:1988] "\tModelConfig::ensemble_scheduling::step::model_version"
I0211 17:34:52.020655 1 model_config_utils.cc:1988] "\tModelConfig::input::dims"
I0211 17:34:52.020656 1 model_config_utils.cc:1988] "\tModelConfig::input::reshape::shape"
I0211 17:34:52.020657 1 model_config_utils.cc:1988] "\tModelConfig::instance_group::secondary_devices::device_id"
I0211 17:34:52.020659 1 model_config_utils.cc:1988] "\tModelConfig::model_warmup::inputs::value::dims"
I0211 17:34:52.020660 1 model_config_utils.cc:1988] "\tModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim"
I0211 17:34:52.020661 1 model_config_utils.cc:1988] "\tModelConfig::optimization::cuda::graph_spec::input::value::dim"
I0211 17:34:52.020663 1 model_config_utils.cc:1988] "\tModelConfig::output::dims"
I0211 17:34:52.020664 1 model_config_utils.cc:1988] "\tModelConfig::output::reshape::shape"
I0211 17:34:52.020665 1 model_config_utils.cc:1988] "\tModelConfig::sequence_batching::direct::max_queue_delay_microseconds"
I0211 17:34:52.020667 1 model_config_utils.cc:1988] "\tModelConfig::sequence_batching::max_sequence_idle_microseconds"
I0211 17:34:52.020668 1 model_config_utils.cc:1988] "\tModelConfig::sequence_batching::oldest::max_queue_delay_microseconds"
I0211 17:34:52.020669 1 model_config_utils.cc:1988] "\tModelConfig::sequence_batching::state::dims"
I0211 17:34:52.020670 1 model_config_utils.cc:1988] "\tModelConfig::sequence_batching::state::initial_state::dims"
I0211 17:34:52.020672 1 model_config_utils.cc:1988] "\tModelConfig::version_policy::specific::versions"
I0211 17:34:52.020732 1 model_state.cc:317] "Setting the CUDA device to GPU0 to auto-complete config for FRCNN-resnet50"
I0211 17:34:52.021669 1 model_state.cc:363] "Using explicit serialized file 'model.plan' to auto-complete config for FRCNN-resnet50"
I0211 17:34:52.078683 1 logging.cc:46] "Loaded engine size: 85 MiB"
I0211 17:34:52.099056 1 logging.cc:49] "Local registry did not find ProposalDynamic creator. Will try parent registry if enabled."
I0211 17:34:52.099067 1 logging.cc:49] "Global registry found ProposalDynamic creator."
I0211 17:34:52.099076 1 logging.cc:49] "Local registry did not find CropAndResizeDynamic creator. Will try parent registry if enabled."
I0211 17:34:52.099079 1 logging.cc:49] "Global registry found CropAndResizeDynamic creator."
I0211 17:34:52.099147 1 logging.cc:49] "Local registry did not find NMSDynamic_TRT creator. Will try parent registry if enabled."
I0211 17:34:52.099149 1 logging.cc:49] "Global registry found NMSDynamic_TRT creator."

Any insight? The server just quits after this output, am I missing something?

grant32 · February 11, 2025, 9:27pm

Do note I have tried loading the engine file into a python script which results in a segmentation fault (tested both on my own environment and the TensorRT container):

import time
import numpy as np
import tensorrt as trt
import pycuda.autoinit
import pycuda.driver as cuda

EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)

batch = 1
host_inputs  = []
cuda_inputs  = []
host_outputs = []
cuda_outputs = []
bindings = []


def Inference(engine):
    image = np.random.rand(1, 3, 640, 640)

    np.copyto(host_inputs[0], image)
    stream = cuda.Stream()
    context = engine.create_execution_context()

    start_time = time.time()
    cuda.memcpy_htod_async(cuda_inputs[0], host_inputs[0], stream)
    context.execute_v2(bindings)
    cuda.memcpy_dtoh_async(host_outputs[0], cuda_outputs[0], stream)
    stream.synchronize()
    print("execute times "+str(time.time()-start_time))

    output = host_outputs[0]
    print(np.argmax(output))


def PrepareEngine():
    trt.init_libnvinfer_plugins(trt.Logger(), '')
    with open('test.trt', 'rb') as f:
        serialized_engine = f.read()

    runtime = trt.Runtime(TRT_LOGGER)
    engine = runtime.deserialize_cuda_engine(serialized_engine)

    # create buffer
    for binding in engine:
        size = trt.volume(engine.get_tensor_shape(binding)) * batch
        host_mem = cuda.pagelocked_empty(shape=[size],dtype=np.float32)
        cuda_mem = cuda.mem_alloc(host_mem.nbytes)

        bindings.append(int(cuda_mem))
        if engine.get_tensor_mode(binding)==trt.TensorIOMode.INPUT:
            host_inputs.append(host_mem)
            cuda_inputs.append(cuda_mem)
        else:
            host_outputs.append(host_mem)
            cuda_outputs.append(cuda_mem)

    return engine


if __name__ == "__main__":
    engine = PrepareEngine()
    Inference(engine)

    engine = []

Which resulted in the following output:

[02/11/2025-14:34:27] [TRT] [I] Loaded engine size: 85 MiB
[02/11/2025-14:34:27] [TRT] [V] Local registry did not find ProposalDynamic creator. Will try parent registry if enabled.
[02/11/2025-14:34:27] [TRT] [V] Global registry found ProposalDynamic creator.
[02/11/2025-14:34:27] [TRT] [V] Local registry did not find CropAndResizeDynamic creator. Will try parent registry if enabled.
[02/11/2025-14:34:27] [TRT] [V] Global registry found CropAndResizeDynamic creator.
[02/11/2025-14:34:27] [TRT] [V] Local registry did not find NMSDynamic_TRT creator. Will try parent registry if enabled.
[02/11/2025-14:34:27] [TRT] [V] Global registry found NMSDynamic_TRT creator.
Segmentation fault (core dumped)

EDIT: I was able to get python code working correctly by initializing the plugins correctly in the line:

trt.init_libnvinfer_plugins(trt.Logger(), '')

Changing it to:

 trt.init_libnvinfer_plugins(TRT_LOGGER, '')

I am assuming maybe TRT plugins in the server are not being initialized correctly as the server stopped at the same point as this python code originally did?

trtexec works correctly with the following command:

/home/ubuntu-testing/TensorRT-10.8.0.43.Linux.x86_64-gnu.cuda-12.8/TensorRT-10.8.0.43/bin/trtexec --loadEngine='/home/ubuntu-testing/TensorRT-10.8.0.43.Linux.x86_64-gnu.cuda-12.8/TensorRT-10.8.0.43/bin/TAO108.trt'

Output:

[02/11/2025-15:22:44] [I] TF32 is enabled by default. Add --noTF32 flag to further improve accuracy with some performance cost.
[02/11/2025-15:22:44] [I] === Model Options ===
[02/11/2025-15:22:44] [I] Format: *
[02/11/2025-15:22:44] [I] Model: 
[02/11/2025-15:22:44] [I] Output:
[02/11/2025-15:22:44] [I] 
[02/11/2025-15:22:44] [I] === System Options ===
[02/11/2025-15:22:44] [I] Device: 0
[02/11/2025-15:22:44] [I] DLACore: 
[02/11/2025-15:22:44] [I] Plugins:
[02/11/2025-15:22:44] [I] setPluginsToSerialize:
[02/11/2025-15:22:44] [I] dynamicPlugins:
[02/11/2025-15:22:44] [I] ignoreParsedPluginLibs: 0
[02/11/2025-15:22:44] [I] 
[02/11/2025-15:22:44] [I] === Inference Options ===
[02/11/2025-15:22:44] [I] Batch: Explicit
[02/11/2025-15:22:44] [I] Input inference shapes: model
[02/11/2025-15:22:44] [I] Iterations: 10
[02/11/2025-15:22:44] [I] Duration: 3s (+ 200ms warm up)
[02/11/2025-15:22:44] [I] Sleep time: 0ms
[02/11/2025-15:22:44] [I] Idle time: 0ms
[02/11/2025-15:22:44] [I] Inference Streams: 1
[02/11/2025-15:22:44] [I] ExposeDMA: Disabled
[02/11/2025-15:22:44] [I] Data transfers: Enabled
[02/11/2025-15:22:44] [I] Spin-wait: Disabled
[02/11/2025-15:22:44] [I] Multithreading: Disabled
[02/11/2025-15:22:44] [I] CUDA Graph: Disabled
[02/11/2025-15:22:44] [I] Separate profiling: Disabled
[02/11/2025-15:22:44] [I] Time Deserialize: Disabled
[02/11/2025-15:22:44] [I] Time Refit: Disabled
[02/11/2025-15:22:44] [I] NVTX verbosity: 0
[02/11/2025-15:22:44] [I] Persistent Cache Ratio: 0
[02/11/2025-15:22:44] [I] Optimization Profile Index: 0
[02/11/2025-15:22:44] [I] Weight Streaming Budget: 100.000000%
[02/11/2025-15:22:44] [I] Inputs:
[02/11/2025-15:22:44] [I] Debug Tensor Save Destinations:
[02/11/2025-15:22:44] [I] === Reporting Options ===
[02/11/2025-15:22:44] [I] Verbose: Disabled
[02/11/2025-15:22:44] [I] Averages: 10 inferences
[02/11/2025-15:22:44] [I] Percentiles: 90,95,99
[02/11/2025-15:22:44] [I] Dump refittable layers:Disabled
[02/11/2025-15:22:44] [I] Dump output: Disabled
[02/11/2025-15:22:44] [I] Profile: Disabled
[02/11/2025-15:22:44] [I] Export timing to JSON file: 
[02/11/2025-15:22:44] [I] Export output to JSON file: 
[02/11/2025-15:22:44] [I] Export profile to JSON file: 
[02/11/2025-15:22:44] [I] 
[02/11/2025-15:22:44] [I] === Device Information ===
[02/11/2025-15:22:44] [I] Available Devices: 
[02/11/2025-15:22:44] [I]   Device 0: "NVIDIA GeForce RTX 4090" UUID: GPU-fea8145b-ad5b-d355-018b-fe36ff27db26
[02/11/2025-15:22:44] [I] Selected Device: NVIDIA GeForce RTX 4090
[02/11/2025-15:22:44] [I] Selected Device ID: 0
[02/11/2025-15:22:44] [I] Selected Device UUID: GPU-fea8145b-ad5b-d355-018b-fe36ff27db26
[02/11/2025-15:22:44] [I] Compute Capability: 8.9
[02/11/2025-15:22:44] [I] SMs: 128
[02/11/2025-15:22:44] [I] Device Global Memory: 24082 MiB
[02/11/2025-15:22:44] [I] Shared Memory per SM: 100 KiB
[02/11/2025-15:22:44] [I] Memory Bus Width: 384 bits (ECC disabled)
[02/11/2025-15:22:44] [I] Application Compute Clock Rate: 2.52 GHz
[02/11/2025-15:22:44] [I] Application Memory Clock Rate: 10.501 GHz
[02/11/2025-15:22:44] [I] 
[02/11/2025-15:22:44] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[02/11/2025-15:22:44] [I] 
[02/11/2025-15:22:44] [I] TensorRT version: 10.8.0
[02/11/2025-15:22:44] [I] Loading standard plugins
[02/11/2025-15:22:44] [I] [TRT] Loaded engine size: 85 MiB
[02/11/2025-15:22:44] [W] [TRT] NMSDynamic_TRT is deprecated since TensorRT 9.0. Use INetworkDefinition::addNMS() to add an INMSLayer OR use EfficientNMS plugin.
[02/11/2025-15:22:44] [I] Engine deserialized in 0.048137 sec.
[02/11/2025-15:22:44] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +176, now: CPU 0, GPU 257 (MiB)
[02/11/2025-15:22:44] [I] Setting persistentCacheLimit to 0 bytes.
[02/11/2025-15:22:44] [I] Created execution context with device memory size: 175.446 MiB
[02/11/2025-15:22:44] [I] Using random values for input input_image
[02/11/2025-15:22:44] [I] Input binding for input_image with dimensions 1x3x640x640 is created.
[02/11/2025-15:22:44] [I] Output binding for nms_out with dimensions 1x1x100x7 is created.
[02/11/2025-15:22:44] [I] Output binding for nms_out_1 with dimensions 1x1x1x1 is created.
[02/11/2025-15:22:44] [I] Starting inference
[02/11/2025-15:22:47] [I] Warmup completed 34 queries over 200 ms
[02/11/2025-15:22:47] [I] Timing trace has 623 queries over 3.01227 s
[02/11/2025-15:22:47] [I] 
[02/11/2025-15:22:47] [I] === Trace details ===
[02/11/2025-15:22:47] [I] Trace averages of 10 runs:
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 5.08682 ms - Host latency: 5.33056 ms (enqueue 0.168547 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.87343 ms - Host latency: 5.11658 ms (enqueue 0.181024 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.79734 ms - Host latency: 5.04422 ms (enqueue 0.211435 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.78915 ms - Host latency: 5.03121 ms (enqueue 0.207626 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.79611 ms - Host latency: 5.03804 ms (enqueue 0.206415 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.85612 ms - Host latency: 5.10212 ms (enqueue 0.200827 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.80092 ms - Host latency: 5.04558 ms (enqueue 0.206107 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81055 ms - Host latency: 5.05818 ms (enqueue 0.361261 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81095 ms - Host latency: 5.05704 ms (enqueue 0.406531 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.8077 ms - Host latency: 5.05557 ms (enqueue 0.211218 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81054 ms - Host latency: 5.05507 ms (enqueue 0.204565 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.80225 ms - Host latency: 5.04594 ms (enqueue 0.365234 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81802 ms - Host latency: 5.06472 ms (enqueue 0.401447 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.8167 ms - Host latency: 5.0603 ms (enqueue 0.393457 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81086 ms - Host latency: 5.05685 ms (enqueue 0.406311 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81268 ms - Host latency: 5.05547 ms (enqueue 0.396924 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.85037 ms - Host latency: 5.09423 ms (enqueue 0.396625 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.8342 ms - Host latency: 5.08232 ms (enqueue 0.237988 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81639 ms - Host latency: 5.06157 ms (enqueue 0.185437 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81549 ms - Host latency: 5.06216 ms (enqueue 0.268677 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81954 ms - Host latency: 5.06287 ms (enqueue 0.399109 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81084 ms - Host latency: 5.05607 ms (enqueue 0.216467 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81169 ms - Host latency: 5.05652 ms (enqueue 0.209424 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81381 ms - Host latency: 5.06539 ms (enqueue 0.225537 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81978 ms - Host latency: 5.06794 ms (enqueue 0.210913 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81155 ms - Host latency: 5.05846 ms (enqueue 0.21001 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81942 ms - Host latency: 5.09198 ms (enqueue 0.588623 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81189 ms - Host latency: 5.1344 ms (enqueue 1.06615 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.88036 ms - Host latency: 5.20353 ms (enqueue 1.06249 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.83705 ms - Host latency: 5.13334 ms (enqueue 0.901636 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.8132 ms - Host latency: 5.0569 ms (enqueue 0.246558 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.82261 ms - Host latency: 5.06406 ms (enqueue 0.174988 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.8165 ms - Host latency: 5.05862 ms (enqueue 0.171667 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81412 ms - Host latency: 5.05686 ms (enqueue 0.175818 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81445 ms - Host latency: 5.05728 ms (enqueue 0.182056 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81544 ms - Host latency: 5.05957 ms (enqueue 0.192639 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.8177 ms - Host latency: 5.0675 ms (enqueue 0.264478 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.80809 ms - Host latency: 5.06288 ms (enqueue 0.49729 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81528 ms - Host latency: 5.08082 ms (enqueue 0.838098 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.82351 ms - Host latency: 5.13926 ms (enqueue 0.971729 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81631 ms - Host latency: 5.13901 ms (enqueue 1.09055 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.87061 ms - Host latency: 5.18625 ms (enqueue 1.00237 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.8106 ms - Host latency: 5.05996 ms (enqueue 0.278174 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81731 ms - Host latency: 5.06431 ms (enqueue 0.191626 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.80864 ms - Host latency: 5.05293 ms (enqueue 0.187988 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81667 ms - Host latency: 5.06252 ms (enqueue 0.219385 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81118 ms - Host latency: 5.05461 ms (enqueue 0.185498 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81172 ms - Host latency: 5.05447 ms (enqueue 0.209839 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.80776 ms - Host latency: 5.05217 ms (enqueue 0.206543 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81414 ms - Host latency: 5.06641 ms (enqueue 0.214819 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81489 ms - Host latency: 5.06291 ms (enqueue 0.215283 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81953 ms - Host latency: 5.12314 ms (enqueue 0.907812 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.82593 ms - Host latency: 5.14932 ms (enqueue 1.07241 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.89458 ms - Host latency: 5.21765 ms (enqueue 1.07532 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.85505 ms - Host latency: 5.15559 ms (enqueue 0.732935 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.82881 ms - Host latency: 5.07668 ms (enqueue 0.218018 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.82275 ms - Host latency: 5.06785 ms (enqueue 0.204883 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.81873 ms - Host latency: 5.06194 ms (enqueue 0.18374 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.82876 ms - Host latency: 5.07761 ms (enqueue 0.211499 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.82043 ms - Host latency: 5.0657 ms (enqueue 0.203491 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.82292 ms - Host latency: 5.07571 ms (enqueue 0.214209 ms)
[02/11/2025-15:22:47] [I] Average on 10 runs - GPU latency: 4.82104 ms - Host latency: 5.06824 ms (enqueue 0.2073 ms)
[02/11/2025-15:22:47] [I] 
[02/11/2025-15:22:47] [I] === Performance summary ===
[02/11/2025-15:22:47] [I] Throughput: 206.821 qps
[02/11/2025-15:22:47] [I] Latency: min = 5.02573 ms, max = 5.86743 ms, mean = 5.08299 ms, median = 5.06262 ms, percentile(90%) = 5.14929 ms, percentile(95%) = 5.15979 ms, percentile(99%) = 5.37061 ms
[02/11/2025-15:22:47] [I] Enqueue Time: min = 0.151245 ms, max = 1.71387 ms, mean = 0.380815 ms, median = 0.211426 ms, percentile(90%) = 1.06665 ms, percentile(95%) = 1.08008 ms, percentile(99%) = 1.10535 ms
[02/11/2025-15:22:47] [I] H2D Latency: min = 0.230957 ms, max = 0.330811 ms, mean = 0.249932 ms, median = 0.236389 ms, percentile(90%) = 0.313965 ms, percentile(95%) = 0.315186 ms, percentile(99%) = 0.316895 ms
[02/11/2025-15:22:47] [I] GPU Compute Time: min = 4.78311 ms, max = 5.54272 ms, mean = 4.82532 ms, median = 4.81372 ms, percentile(90%) = 4.83936 ms, percentile(95%) = 4.85278 ms, percentile(99%) = 5.09541 ms
[02/11/2025-15:22:47] [I] D2H Latency: min = 0.00341797 ms, max = 0.010498 ms, mean = 0.00773915 ms, median = 0.00805664 ms, percentile(90%) = 0.00927734 ms, percentile(95%) = 0.00952148 ms, percentile(99%) = 0.0100098 ms
[02/11/2025-15:22:47] [I] Total Host Walltime: 3.01227 s
[02/11/2025-15:22:47] [I] Total GPU Compute Time: 3.00617 s
[02/11/2025-15:22:47] [W] * GPU compute time is unstable, with coefficient of variance = 1.28654%.
[02/11/2025-15:22:47] [W]   If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[02/11/2025-15:22:47] [I] Explanations of the performance metrics are printed in the verbose logs.
[02/11/2025-15:22:47] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v100800] [b43] # /home/ubuntu-testing/TensorRT-10.8.0.43.Linux.x86_64-gnu.cuda-12.8/TensorRT-10.8.0.43/bin/trtexec --loadEngine=/home/ubuntu-testing/TensorRT-10.8.0.43.Linux.x86_64-gnu.cuda-12.8/TensorRT-10.8.0.43/bin/TAO108.trt

Morganh · February 12, 2025, 2:52am

For the “Validation failed: libNamespace == nullptr” error, it is an issue in this version of tensorrt plugin code.
TensorRT/plugin/proposalPlugin/proposalPlugin.cpp at 23.08 · NVIDIA/TensorRT · GitHub and
TensorRT/plugin/proposalPlugin/proposalPlugin.cpp at 23.08 · NVIDIA/TensorRT · GitHub

PLUGIN_VALIDATE(libNamespace == nullptr);

should be

PLUGIN_VALIDATE(libNamespace != nullptr);

Issue is fixed after TRT 9.0 version.

You can modify the plugin code and rebuild, then replace the libnvinfer_plugin.so.

grant32 · February 12, 2025, 7:24pm

Thanks for your answer. I was able to find a different workaround before your response as in my comment, by converting the onnx model to a TRT engine with TRT 10.8 trtexec. If you recommend sticking with TensorRT 8.6 I can try your method, do let me know. However, I still need assistance with the server not starting. With the 10.8 model, I am able to perform inference by loading in python, and trtexec passes, as in another of my previous comments. However the triton server is unable to start, as listed above, I assume because of not loading plugins properly. In python I used the command below to fix this:

 trt.init_libnvinfer_plugins(TRT_LOGGER, '')

Is there something I can do on the Triton Server to accomplish something similar? Such as adding something to the server environment variables for the plugin locations?

Note I have also tried generating the TRT engine inside of the triton server. Launching then prints out a segmentation fault as appeared in the python code:

root@78cb1e3d126e:/opt/tritonserver# tritonserver --model-repository=/opt/tritonserver/models --http-port=8000 --grpc-port=8001 --metrics-port=8002 --log-verbose 1
I0212 20:33:44.108490 297 cache_manager.cc:480] "Create CacheManager with cache_dir: '/opt/tritonserver/caches'"
I0212 20:33:44.246535 297 pinned_memory_manager.cc:277] "Pinned memory pool is created at '0x7b3de6000000' with size 268435456"
I0212 20:33:44.248288 297 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 0 with size 67108864"
I0212 20:33:44.251610 297 model_config_utils.cc:753] "Server side auto-completed config: "
name: "FRCNN-resnet50"
platform: "tensorrt_plan"
max_batch_size: 1
input {
  name: "input_image"
  data_type: TYPE_FP32
  dims: 1
  dims: 3
  dims: 640
  dims: 640
  reshape {
    shape: 1
    shape: 3
    shape: 640
    shape: 640
  }
}
output {
  name: "nms_out"
  data_type: TYPE_FP32
  dims: 1
  dims: 1
  dims: 100
  dims: 7
}
output {
  name: "nms_out_1"
  data_type: TYPE_FP32
  dims: 1
  dims: 1
  dims: 1
  dims: 1
}
default_model_filename: "model.plan"
backend: "tensorrt"

I0212 20:33:44.251659 297 model_lifecycle.cc:473] "loading: FRCNN-resnet50:1"
I0212 20:33:44.251754 297 backend_model.cc:503] "Adding default backend config setting: default-max-batch-size,4"
I0212 20:33:44.251774 297 shared_library.cc:113] "OpenLibraryHandle: /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so"
I0212 20:33:44.268152 297 tensorrt.cc:65] "TRITONBACKEND_Initialize: tensorrt"
I0212 20:33:44.268174 297 tensorrt.cc:75] "Triton TRITONBACKEND API version: 1.19"
I0212 20:33:44.268178 297 tensorrt.cc:81] "'tensorrt' TRITONBACKEND API version: 1.19"
I0212 20:33:44.268180 297 tensorrt.cc:105] "backend configuration:\n{\"cmdline\":{\"auto-complete-config\":\"true\",\"backend-directory\":\"/opt/tritonserver/backends\",\"min-compute-capability\":\"6.000000\",\"default-max-batch-size\":\"4\"}}"
I0212 20:33:44.268189 297 tensorrt.cc:187] "Registering TensorRT Plugins"
I0212 20:33:44.268200 297 logging.cc:49] "Registered plugin creator - ::ROIAlign_TRT version 2"
I0212 20:33:44.268204 297 logging.cc:49] "Registered plugin creator - ::BatchedNMSDynamic_TRT version 1"
I0212 20:33:44.268207 297 logging.cc:49] "Registered plugin creator - ::BatchedNMS_TRT version 1"
I0212 20:33:44.268210 297 logging.cc:49] "Registered plugin creator - ::BatchTilePlugin_TRT version 1"
I0212 20:33:44.268215 297 logging.cc:49] "Registered plugin creator - ::Clip_TRT version 1"
I0212 20:33:44.268221 297 logging.cc:49] "Registered plugin creator - ::CoordConvAC version 1"
I0212 20:33:44.268224 297 logging.cc:49] "Registered plugin creator - ::CropAndResizeDynamic version 1"
I0212 20:33:44.268227 297 logging.cc:49] "Registered plugin creator - ::CropAndResize version 1"
I0212 20:33:44.268230 297 logging.cc:49] "Registered plugin creator - ::DecodeBbox3DPlugin version 1"
I0212 20:33:44.268233 297 logging.cc:49] "Registered plugin creator - ::DetectionLayer_TRT version 1"
I0212 20:33:44.268237 297 logging.cc:49] "Registered plugin creator - ::EfficientNMS_Explicit_TF_TRT version 1"
I0212 20:33:44.268240 297 logging.cc:49] "Registered plugin creator - ::EfficientNMS_Implicit_TF_TRT version 1"
I0212 20:33:44.268243 297 logging.cc:49] "Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1"
I0212 20:33:44.268246 297 logging.cc:49] "Registered plugin creator - ::EfficientNMS_TRT version 1"
I0212 20:33:44.268251 297 logging.cc:49] "Registered plugin creator - ::FlattenConcat_TRT version 1"
I0212 20:33:44.268255 297 logging.cc:49] "Registered plugin creator - ::GenerateDetection_TRT version 1"
I0212 20:33:44.268258 297 logging.cc:49] "Registered plugin creator - ::GridAnchor_TRT version 1"
I0212 20:33:44.268260 297 logging.cc:49] "Registered plugin creator - ::GridAnchorRect_TRT version 1"
I0212 20:33:44.268263 297 logging.cc:49] "Registered plugin creator - ::InstanceNormalization_TRT version 1"
I0212 20:33:44.268268 297 logging.cc:49] "Registered plugin creator - ::InstanceNormalization_TRT version 2"
I0212 20:33:44.268273 297 logging.cc:49] "Registered plugin creator - ::InstanceNormalization_TRT version 3"
I0212 20:33:44.268276 297 logging.cc:49] "Registered plugin creator - ::LReLU_TRT version 1"
I0212 20:33:44.268279 297 logging.cc:49] "Registered plugin creator - ::ModulatedDeformConv2d version 1"
I0212 20:33:44.268283 297 logging.cc:49] "Registered plugin creator - ::MultilevelCropAndResize_TRT version 1"
I0212 20:33:44.268287 297 logging.cc:49] "Registered plugin creator - ::MultilevelProposeROI_TRT version 1"
I0212 20:33:44.268292 297 logging.cc:49] "Registered plugin creator - ::MultiscaleDeformableAttnPlugin_TRT version 1"
I0212 20:33:44.268295 297 logging.cc:49] "Registered plugin creator - ::NMSDynamic_TRT version 1"
I0212 20:33:44.268298 297 logging.cc:49] "Registered plugin creator - ::NMS_TRT version 1"
I0212 20:33:44.268302 297 logging.cc:49] "Registered plugin creator - ::Normalize_TRT version 1"
I0212 20:33:44.268305 297 logging.cc:49] "Registered plugin creator - ::PillarScatterPlugin version 1"
I0212 20:33:44.268309 297 logging.cc:49] "Registered plugin creator - ::PriorBox_TRT version 1"
I0212 20:33:44.268312 297 logging.cc:49] "Registered plugin creator - ::ProposalDynamic version 1"
I0212 20:33:44.268315 297 logging.cc:49] "Registered plugin creator - ::ProposalLayer_TRT version 1"
I0212 20:33:44.268319 297 logging.cc:49] "Registered plugin creator - ::Proposal version 1"
I0212 20:33:44.268324 297 logging.cc:49] "Registered plugin creator - ::PyramidROIAlign_TRT version 1"
I0212 20:33:44.268330 297 logging.cc:49] "Registered plugin creator - ::Region_TRT version 1"
I0212 20:33:44.268333 297 logging.cc:49] "Registered plugin creator - ::Reorg_TRT version 2"
I0212 20:33:44.268336 297 logging.cc:49] "Registered plugin creator - ::Reorg_TRT version 1"
I0212 20:33:44.268339 297 logging.cc:49] "Registered plugin creator - ::ResizeNearest_TRT version 1"
I0212 20:33:44.268342 297 logging.cc:49] "Registered plugin creator - ::ROIAlign_TRT version 1"
I0212 20:33:44.268345 297 logging.cc:49] "Registered plugin creator - ::RPROI_TRT version 1"
I0212 20:33:44.268348 297 logging.cc:49] "Registered plugin creator - ::ScatterElements version 1"
I0212 20:33:44.268351 297 logging.cc:49] "Registered plugin creator - ::ScatterElements version 2"
I0212 20:33:44.268354 297 logging.cc:49] "Registered plugin creator - ::ScatterND version 1"
I0212 20:33:44.268359 297 logging.cc:49] "Registered plugin creator - ::SpecialSlice_TRT version 1"
I0212 20:33:44.268365 297 logging.cc:49] "Registered plugin creator - ::Split version 1"
I0212 20:33:44.268368 297 logging.cc:49] "Registered plugin creator - ::VoxelGeneratorPlugin version 1"
I0212 20:33:44.270108 297 tensorrt.cc:231] "TRITONBACKEND_ModelInitialize: FRCNN-resnet50 (version 1)"
I0212 20:33:44.270330 297 model_config_utils.cc:1986] "ModelConfig 64-bit fields:"
I0212 20:33:44.270333 297 model_config_utils.cc:1988] "\tModelConfig::dynamic_batching::default_priority_level"
I0212 20:33:44.270336 297 model_config_utils.cc:1988] "\tModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds"
I0212 20:33:44.270338 297 model_config_utils.cc:1988] "\tModelConfig::dynamic_batching::max_queue_delay_microseconds"
I0212 20:33:44.270340 297 model_config_utils.cc:1988] "\tModelConfig::dynamic_batching::priority_levels"
I0212 20:33:44.270342 297 model_config_utils.cc:1988] "\tModelConfig::dynamic_batching::priority_queue_policy::key"
I0212 20:33:44.270344 297 model_config_utils.cc:1988] "\tModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds"
I0212 20:33:44.270346 297 model_config_utils.cc:1988] "\tModelConfig::ensemble_scheduling::step::model_version"
I0212 20:33:44.270349 297 model_config_utils.cc:1988] "\tModelConfig::input::dims"
I0212 20:33:44.270350 297 model_config_utils.cc:1988] "\tModelConfig::input::reshape::shape"
I0212 20:33:44.270352 297 model_config_utils.cc:1988] "\tModelConfig::instance_group::secondary_devices::device_id"
I0212 20:33:44.270354 297 model_config_utils.cc:1988] "\tModelConfig::model_warmup::inputs::value::dims"
I0212 20:33:44.270356 297 model_config_utils.cc:1988] "\tModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim"
I0212 20:33:44.270358 297 model_config_utils.cc:1988] "\tModelConfig::optimization::cuda::graph_spec::input::value::dim"
I0212 20:33:44.270360 297 model_config_utils.cc:1988] "\tModelConfig::output::dims"
I0212 20:33:44.270362 297 model_config_utils.cc:1988] "\tModelConfig::output::reshape::shape"
I0212 20:33:44.270364 297 model_config_utils.cc:1988] "\tModelConfig::sequence_batching::direct::max_queue_delay_microseconds"
I0212 20:33:44.270366 297 model_config_utils.cc:1988] "\tModelConfig::sequence_batching::max_sequence_idle_microseconds"
I0212 20:33:44.270368 297 model_config_utils.cc:1988] "\tModelConfig::sequence_batching::oldest::max_queue_delay_microseconds"
I0212 20:33:44.270370 297 model_config_utils.cc:1988] "\tModelConfig::sequence_batching::state::dims"
I0212 20:33:44.270373 297 model_config_utils.cc:1988] "\tModelConfig::sequence_batching::state::initial_state::dims"
I0212 20:33:44.270375 297 model_config_utils.cc:1988] "\tModelConfig::version_policy::specific::versions"
I0212 20:33:44.270429 297 model_state.cc:317] "Setting the CUDA device to GPU0 to auto-complete config for FRCNN-resnet50"
I0212 20:33:44.271413 297 model_state.cc:363] "Using explicit serialized file 'model.plan' to auto-complete config for FRCNN-resnet50"
I0212 20:33:44.321936 297 logging.cc:46] "Loaded engine size: 85 MiB"
I0212 20:33:44.341177 297 logging.cc:49] "Local registry did not find ProposalDynamic creator. Will try parent registry if enabled."
I0212 20:33:44.341192 297 logging.cc:49] "Global registry found ProposalDynamic creator."
I0212 20:33:44.341201 297 logging.cc:49] "Local registry did not find CropAndResizeDynamic creator. Will try parent registry if enabled."
I0212 20:33:44.341205 297 logging.cc:49] "Global registry found CropAndResizeDynamic creator."
I0212 20:33:44.341274 297 logging.cc:49] "Local registry did not find NMSDynamic_TRT creator. Will try parent registry if enabled."
I0212 20:33:44.341278 297 logging.cc:49] "Global registry found NMSDynamic_TRT creator."
Segmentation fault (core dumped)

Morganh · February 17, 2025, 2:45am

For running triton with TAO model, there is an official github. GitHub - NVIDIA-AI-IOT/tao-toolkit-triton-apps: Sample app code for deploying TAO Toolkit trained models to Triton. Could you try to run with it and leverage?

grant32 · February 17, 2025, 3:55pm

I’m sure those models will work on your link. However, I am working with TAO FasterRCNN, which is not one of the ones they provide an example for, which I am assuming will end up having the same issue I am facing here. Any input on my previous comment? Do you believe TensorRT 8 would solve my issue?

Morganh · February 18, 2025, 2:18am

In official tao_tritron github, the libnvinfer_plugin is built and then replaced. See its docker file tao-toolkit-triton-apps/docker/Dockerfile at 9a30f9692bf29fb728520e9dba1c79be2bf65e74 · NVIDIA-AI-IOT/tao-toolkit-triton-apps · GitHub.

So, for your own triton server, it is also needed to build the tensorrt plugin and make sure it is replaced in your own triton server.

The Triton server crashes with a segmentation fault after registering TensorRT plugins, specifically when trying to load the engine. This strongly suggests an issue with plugin loading or compatibility within the Triton environment. The fact that trtexec passes and Python inference works implies the engine itself is valid, but there’s a discrepancy when Triton attempts to use it.

Suggestion:

Explicit Plugin Path: While Triton registers the plugins, it might not be finding them correctly during engine execution. Try explicitly setting the LD_LIBRARY_PATH environment variable within the Triton container to include the path to the TensorRT plugin libraries. This is the closest equivalent to the trt.init_libnvinfer_plugins() call in Python. You need to find where those plugins reside within the container. It’s often something like /usr/lib/x86_64-linux-gnu or /opt/tensorrt/lib.

docker run -d --name triton_server \
  --gpus all \
  -p 8000:8000 \
  -p 8001:8001 \
  -p 8002:8002 \
  -v /path/to/your/models:/opt/tritonserver/models \
  -e LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:/opt/tensorrt/lib:$LD_LIBRARY_PATH \
  nvcr.io/nvidia/tritonserver:<YOUR_TRITON_VERSION> tritonserver --model-repository=/opt/tritonserver/models --http-port=8000 --grpc-port=8001 --metrics-port=8002 --log-verbose 1

Replace <YOUR_TRITON_VERSION> with the specific version you’re using. Replace /usr/lib/x86_64-linux-gnu:/opt/tensorrt/lib with the actual path to your TensorRT’s plugin libraries.

More,

Triton and TensorRT Mismatch: This is the most likely culprit. Triton Server has a specific dependency on a particular TensorRT version. Even though you built the engine with TensorRT 10.8 and it runs in Python, the Triton container might be using an older TensorRT version. This is a very common cause of segmentation faults.
- Identify Triton’s TensorRT Version: The easiest way to determine the TensorRT version inside the Triton container is to inspect the container’s environment variables or check the Triton server logs when it starts. It should print the TensorRT version it’s using. Look for something like TensorRT version: 8.6.1.
- Match TensorRT Versions: The ideal solution is to use a Triton container that’s built against the same TensorRT version you used to create the engine (10.8 in your case). NVIDIA provides various Triton containers; select one that matches. If a matching container isn’t available, you might need to build your own Triton container from source to ensure compatibility.

grant32 · February 20, 2025, 4:08pm

Thank you for your response. I will take a look at these options. I am certain the TRT versions match, but I will check the other items.

Topic		Replies	Views
`Error No Op registered for NMSDynamic_TRT...` when trying to run Trition inference server with a SSD model TAO Toolkit jetson	12	1221	October 12, 2023
Tao-converted .plan model running in triton-server turned to bad accurate TAO Toolkit	46	3554	April 1, 2022
Generation of Triton Inference Server configuration for TensorRT exported model of TAO classification (resnet) TAO Toolkit tensorrt , inference-server-triton , tao	7	2623	June 23, 2022
DeepStream 6.0.1 Triton GRPC memory leak DeepStream SDK nvbugs	23	2738	September 2, 2022
Triton server for squad model on P100 with TensorRT 6.0 Triton Inference Server - archived	0	889	June 23, 2020
Cuda OutOfMemory when creating tensor with 2^29 (~0.5 G) elements TensorRT tensorrt , cuda , onnx	6	1754	March 9, 2022
Regarding when we execute triton server on jetson orin getting an error unable to load model DeepStream SDK cuda	19	664	July 30, 2024
RetinaNet trained with taotoolkit cannot be run on the triton server when converting with TensorRT 10.04 TAO Toolkit tensorrt , inference-server-triton , jetson	13	35	December 5, 2024
Triton inference server is sending back "HTTP/1.1 400 Bad Request" TAO Toolkit	6	3384	October 12, 2021
Convert tensorrt engine from version 7 to 8 TAO Toolkit tensorrt	67	4362	October 12, 2021

Triton Server Error with TAO FasterRCNN model: Validation failed: libNamespace == nullptr

Related topics