Wrong tensor meta output from nvinferserver with triton

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU): Ubuntu 24.04 RTX 3060
• DeepStream Version: 8.0
• TensorRT Version: 10.3
• NVIDIA GPU Driver Version 580.95.05
• Issue Type: questions
Hi NVIDIA team,

I’m currently using DeepStream (with nvinferserver) connected to Triton Inference Server (gRPC) to run an ONNX model as a SGIE model.

dets:     float32[batch_size, N, 5]
labels:   int64[batch_size, N]
polygons: float32[batch_size, N, 8]
colors:   int64[batch_size, N]
types:    int64[batch_size, N]

When I test the model directly using the Triton Python client, all outputs are correct.
However, when the same model is loaded via nvinferserver, the tensor metadata (tensor_meta in probe function) is not correct.

Here’s a my config:

infer_config {
    backend {
        triton {
            model_name: "plate_detector"
            version: 1
            grpc {
                url: "triton:8001"
                enable_cuda_buffer_sharing: true
            }
        }
    }
    preprocess {
        network_format: IMAGE_FORMAT_RGB
        normalize {
            scale_factor: 0.017507
            channel_offsets: [123.675, 116.28, 103.53]
        }
    }
}
output_control {
    output_tensor_meta: true
}

My triton config:

name: "license_plate_detector",
platform: "tensorrt_plan"
default_model_filename: "model.plan"
max_batch_size: 16
version_policy: { specific: { versions: [1]}}
input: [
    {
        name: "input",
        data_type: TYPE_FP32,
        dims: [3, 512, 512],
    }
]
output: [
    {
        name: "dets",
        data_type: TYPE_FP32,
        dims: [-1, 5]
    },
    {
        name: "polygons",
        data_type: TYPE_FP32,
        dims: [-1, 8]
    },
    {
        name: "labels",
        data_type: TYPE_INT64,
        dims: [-1]
    },
    {
        name: "colors",
        data_type: TYPE_INT64,
        dims: [-1]
    },
    {
        name: "types",
        data_type: TYPE_INT64,
        dims: [-1]
    }
]

optimization: {
    priority: PRIORITY_DEFAULT,
    input_pinned_memory: {
        enable: true
    },
    output_pinned_memory: {
        enable: true
    },
    gather_kernel_buffer_threshold: 0,
    eager_batching: false
}
instance_group: [
    {
        name: "plate_detector",
        kind: KIND_GPU,
        count: 2,
        host_policy: ""
    }
]
model_warmup: []
dynamic_batching {
    max_queue_delay_microseconds: 100
}

Here the simple code that I use to parse tensor from user_meta data:

def get_layer(
    layers_info: list[pyds.NvDsInferLayerInfo],
    name: str,
) -> Optional[pyds.NvDsInferLayerInfo]:
    """Find a layer by name"""
    for layer in layers_info:
        if layer.layerName and layer.layerName == name:
            return layer
    return None


def ds_parse_plate_detector(layers_info: list[Any]):
    """Parse Triton outputs from license_plate_detector"""

    def np_from_layer(layer: pyds.NvDsInferLayerInfo) -> np.ndarray:
        if layer.dataType == pyds.NvDsInferDataType.FLOAT:
            dtype = ctypes.c_float
        elif layer.dataType == pyds.NvDsInferDataType.INT32:
            dtype = ctypes.c_int32
        # Rebuilt support for INT64 data type from pyds
        elif layer.dataType == pyds.NvDsInferDataType.INT64:
            dtype = ctypes.c_int64
        else:
            raise ValueError(f"Unsupported data type: {layer.dataType}")
        shape = [layer.inferDims.d[i] for i in range(layer.inferDims.numDims)]
        # Convert to numpy array
        ptr = ctypes.cast(pyds.get_ptr(layer.buffer), ctypes.POINTER(dtype))
        layer_array = np.ctypeslib.as_array(ptr, shape=shape)
        return layer_array

    det_layer = get_layer(layers_info, "dets")
    label_layer = get_layer(layers_info, "labels")
    poly_layer = get_layer(layers_info, "polygons")
    color_layer = get_layer(layers_info, "colors")
    type_layer = get_layer(layers_info, "types")
    if (
        det_layer is None
        or label_layer is None
        or poly_layer is None
        or color_layer is None
        or type_layer is None
    ):
        raise ValueError("Missing required layers in plate detector output")
    dets = np_from_layer(det_layer)
    labels = np_from_layer(label_layer)
    polygons = np_from_layer(poly_layer)
    colors = np_from_layer(color_layer)
    multilines = np_from_layer(type_layer)

    # post-process to get patches
    # score thresholding
    bboxes = dets[:, :-1]
    scores = dets[:, -1]

    plate_info = dict(
        bboxes=bboxes.tolist(),
        scores=scores.tolist(),
        labels=labels.tolist(),
        polygons=polygons.tolist(),
        colors=colors.tolist(),
        multilines=multilines.tolist(),
    )
    return plate_info

I can successfully parse the tensor, here the output shapes:
colors: [32], dtype=NvDsInferDataType.INT64
dets: [32, 5], dtype=NvDsInferDataType.FLOAT
labels: [32], dtype=NvDsInferDataType.INT64
polygons: [32, 8], dtype=NvDsInferDataType.FLOAT
types: [32], dtype=NvDsInferDataType.INT64
but the values are not correct.
Thanks in advance for any help or insights!

Please make sure the preprocessing configurations in nvinferserver and triton Python client test are consistent. Please refer to the parameters explanation in the doc.
If still can’t work, please share the preprocessing code in Triton Python client.

It’s still not working, here my preprocess in triton client:

def __init__(
        self,
        cfg,
        DET_THR: float = 0.5,
        MODEL_IMG_SIZE_W: int = 512,
        MODEL_IMG_SIZE_H: int = 512,
        MODEL_NAME: str = "plate_detector",
        INPUT_NAME: str = "input",
        OUTPUT_NAMES: list = ["dets", "labels", "polygons", "colors", "types"],
        MEAN: list = [123.675, 116.28, 103.53],
        STD: list = [58.395, 57.12, 57.375],
        TO_RGB: bool = True,
    ):
    def _get_rescale_ratio(self, image: np.ndarray):
        ori_h, ori_w = image.shape[:2]
        scale_factor = min(
            self.MODEL_IMG_SIZE_W / ori_w, self.MODEL_IMG_SIZE_H / ori_h
        )
        return scale_factor

    def _get_new_size(self, img: np.ndarray, scale: float):
        img_h, img_w = img.shape[:2]

        return int(img_w * float(scale) + 0.5), int(img_h * float(scale) + 0.5)

    def _normalize(self, image: np.ndarray, to_rgb: bool = True):
            image = image.astype(np.float32)
    
            mean = self.MEAN.reshape(1, -1).astype(np.float64)
            std_inv = 1 / self.STD.reshape(1, -1).astype(np.float64)
    
            if to_rgb:
                cv2.cvtColor(image, cv2.COLOR_BGR2RGB, image)
    
            cv2.subtract(image, mean, image)
            cv2.multiply(image, std_inv, image)
    
            return image.transpose([2, 0, 1]).astype(np.float32)

    def _paddding(self, image: np.ndarray, padding_value: int = 0):
        width = max(self.MODEL_IMG_SIZE_W - image.shape[1], 0)
        height = max(self.MODEL_IMG_SIZE_H - image.shape[0], 0)
        padding = (0, 0, width, height)

        return cv2.copyMakeBorder(
            image,
            padding[1],
            padding[3],
            padding[0],
            padding[2],
            cv2.BORDER_CONSTANT,
            value=padding_value,
        )
    def _preprocess(self, image: np.ndarray):
        scale = self._get_rescale_ratio(image)
        new_size = self._get_new_size(image, scale)

        image = cv2.resize(image, new_size)
        image = self._paddding(image)
        image = self._normalize(image)
        image = np.expand_dims(image, 0)

        return image, scale

currently the nvinferserver does not supports std deviation, only supports “y = netscalefactor * (x - mean)” preprocessing, that is, netscalefactor is the same for channel. nvinferserver is opensource. you can modify the code to custmize, or you can use “nvdspreprocess + nvinfersrver”, customzie preprocessing in opensource nvdspreprocess. Please refer to this sample, which uses the thirdpart lib roiconvert to do STD deviation preprocessing.

I tried use customized preprocessing with above link, but it’s not working, my pipeline seems only run on a first frame and stuck:

infer_config {
    gpu_ids: [0]
    unique_id: 11
    max_batch_size: 1
    backend {
        triton {
            model_name: "license_plate_detector"
            version: 1
            grpc {
                url: "triton:8001"
                enable_cuda_buffer_sharing: true
            }
        }
    }

    input_tensor_from_meta { 
        is_first_dim_batch : true 
    }

    extra {
        copy_input_to_host_buffers: false
    }
}
input_control {
    process_mode: PROCESS_MODE_CLIP_OBJECTS
    operate_on_gie_id: 1
    operate_on_class_ids: [1,2,3,4,5,6]
    interval: 0
}
output_control {
    output_tensor_meta: true
}

and preprocess config:

################################################################################
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: LicenseRef-NvidiaProprietary
#
# NVIDIA CORPORATION, its affiliates and licensors retain all intellectual
# property and proprietary rights in and to this material, related
# documentation and any modifications thereto. Any use, reproduction,
# disclosure or distribution of this material and related documentation
# without an express license agreement from NVIDIA CORPORATION or
# its affiliates is strictly prohibited.
################################################################################

# The values in the config file are overridden by values set through GObject
# properties.

[property]
enable=1
# list of component gie-id for which tensor is prepared
target-unique-ids=11
# preprocess on metadata generated by this unique gie-id
operate-on-gie-id=1
# 0=NCHW, 1=NHWC, 2=CUSTOM
network-input-order=0
# 0=process on objects 1=process on frames
process-on-frame=0
#uniquely identify the metadata generated by this element
unique-id=15
# gpu-id to be used
gpu-id=0
# processing width/height at which image scaled
processing-width=512
processing-height=512
# max buffer in scaling buffer pool
scaling-buf-pool-size=6
# max buffer in tensor buffer pool
tensor-buf-pool-size=6
# tensor shape based on network-input-order
network-input-shape= 1;3;512;512
# 0=RGB, 1=BGR, 2=GRAY
network-color-format=0
# 0=FP32, 1=UINT8, 2=INT8, 3=UINT32, 4=INT32, 5=FP16
tensor-data-type=0
# tensor name same as input layer name
tensor-name=input
# 0=NVBUF_MEM_DEFAULT 1=NVBUF_MEM_CUDA_PINNED 2=NVBUF_MEM_CUDA_DEVICE 3=NVBUF_MEM_CUDA_UNIFIED
scaling-pool-memory-type=0
# 0=NvBufSurfTransformCompute_Default 1=NvBufSurfTransformCompute_GPU 2=NvBufSurfTransformCompute_VIC
scaling-pool-compute-hw=1
# Scaling Interpolation method
# 0=NvBufSurfTransformInter_Nearest 1=NvBufSurfTransformInter_Bilinear 6=NvBufSurfTransformInter_Default
scaling-filter=1
# custom library .so path having custom functionality
custom-lib-path=/libs/libcustom2d_preprocess.so
# custom tensor preparation function name having predefined input/outputs
# check the default custom library nvdspreprocess_lib for more info
custom-tensor-preparation-function=CustomTensorPreparation

[user-configs]
# Below parameters get used when using default custom library nvdspreprocess_lib
# network scaling factor
pixel-normalization-factor=0.01712;0.01750;0.01742
# array of offsets for each channel
offsets=123.675;116.28;103.53
# Scaling Interpolation method
# 0=Nearest 1=Bilinear 2=Default(Nearest)
scaling-filter=1
# scale type
# 0: "maintain-aspect-ratio=0", 1: "maintain-aspect-ratio=1 and center" 2: matrix
scale-type=1
# When scale-type is set to 2, a 3x2 user-defined affine matrix
# affine-matrix=1.0;0.0;0.0;0.0;1.0;0.0

[group-0]
src-ids=0;1;2;3
operate-on-class-ids=1;2;3;4;5;6
custom-input-transformation-function=CustomAsyncTransformation
process-on-all-objects=1
input-object-min-width=60
input-object-min-height=60
input-object-max-width=2000
input-object-max-height=2000

When I check log from deepstream, I see that:

NFO: TritonGrpcBackend id:11 initialized for model: license_plate_detector
WARNING: unsupported tensor order for dims to image-info, retry as kLinear
Using offsets: 123.675003, 116.279999, 103.529999
0 is set for scaling-filter, using Bilinear
1 is set for scale_type, using FitCenter
Using scales: 0.01712, 0.01750, 0.01742
affine_matrix: 1.000 0.000 0.000 0.000 1.000 0.000
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
[NvMultiObjectTracker] Initialized
0:00:00.662594653 292 0x30ee5260 INFO nvinfer gstnvinfer.cpp:685:gst_nvinfer_logger:<vehicle_detector> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2109> [UID = 1]: deserialized trt engine from :/models/tnism/traffic7_detector/model.engine
0:00:00.662629491 292 0x30ee5260 INFO nvinfer gstnvinfer.cpp:685:gst_nvinfer_logger:<vehicle_detector> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2212> [UID = 1]: Use deserialized engine model: /models/tnism/traffic7_detector/model.engine
0:00:00.664301313 292 0x30ee5260 INFO nvinfer gstnvinfer_impl.cpp:343:notifyLoadModelStatus:<vehicle_detector> [UID 1]: Load new model:configs/vehicle_detector_trt.txt sucessfully
2025-11-05 01:30:36,832-INFO-ai-main_deepstream.py-1235: Starting pipeline
2025-11-05 01:30:36,832-INFO-ai-main_deepstream.py-1241: Pipeline started, current state: , ,
Unsupported configure 203.
Unsupported configure 203.
Unsupported configure 203.
Unsupported configure 203.
Unsupported configure 203.
Unsupported configure 203.
2025-11-05 01:30:46,651-INFO-ai-timer.py-131: topic_id=CaiHemNho fps=1.4 is_alive=True

My pipeline stuck here

In preprocess cfg of sgie, please modify the first value(batch) of network-input-shape to a big nubmer because maybe there are many objects. for example, network-input-shape= 32;3;512;512

Do you mean that if the number of objects is larger than the batch size specified in network-input-shape, the pipeline will get stuck?

plase refer to /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-preprocess-test/README for how to set network-input-shape[0]). To be accurate, it is “much larger than”.