Nvinfer yields constant OCR text with NHWC engine (fast_plate_ocr – cct_s_v1_global_model) while nvinferserver returns correct results

nam278z01 · October 2, 2025, 5:41pm

Title

nvinfer yields constant OCR text with NHWC engine (fast_plate_ocr – cct_s_v1_global_model) while nvinferserver returns correct results

Environment

SDK: NVIDIA DeepStream 7.x
Pipelines:
- ✅ nvinferserver (Triton) → OCR output is correct
- ❌ nvinfer (SGIE) → OCR output collapses to a constant/few fixed characters
Model: fast_plate_ocr → cct_s_v1_global_model
- Input: [N, 64, 128, 3] (NHWC, UINT8)
- Output: [N, 9, 37] (multi-head classification; alphabet 0-9A-Z plus _ for pad)
Use case: SGIE operating on cropped license plate ROIs

DeepStream docs note nvinfer performs internal format conversion/scaling and feeds planar data to TensorRT (RGB/BGR/GRAY with network H×W), which can be a source of layout mismatches vs. NHWC engines if not handled carefully. (NVIDIA Docs)

Summary

Using the same ONNX/plan and the same custom classifier parser:

nvinferserver (Triton) with NHWC config produces correct plate strings. (NVIDIA Docs)
nvinfer (SGIE) only runs when set to:
```
infer-dims=3;64;128
network-input-order=1  # NHWC
```
but the decoded plate becomes a constant/incorrect string across frames.
Switching to the “intuitive” H;W;C form (64;128;3) with network-input-order=1 triggers TensorRT profile/dimension mismatches and rebuild attempts (example log below).

This suggests a layout/preprocess inconsistency in nvinfer with NHWC engines, whereas Triton’s path behaves as expected.

Expected vs. Actual

Expected: nvinfer with an NHWC engine and matching config should decode identical OCR to nvinferserver.
Actual: nvinfer either (a) runs but returns a constant/incorrect plate string, or (b) fails with TensorRT profile/dimension mismatch errors when using H;W;C dims.

Repro Steps

Build fast_plate_ocr cct_s_v1_global_model with NHWC input [1,64,128,3] and output [1,9,37].
Run as SGIE via nvinferserver (config below) → correct OCR. (NVIDIA Docs)
Switch to nvinfer (config below), set network-input-order=1 to match NHWC.
- With infer-dims=3;64;128, pipeline runs but OCR collapses to a constant string.
- With infer-dims=64;128;3, nvinfer attempts to rebuild and fails with TRT dimension/profile mismatches.

Logs (representative)

[FullDims Engine Info]:
0   INPUT  kUINT8 input           64x128x3        min: 1x64x128x3      opt: 8x64x128x3      Max: 8x64x128x3
1   OUTPUT kFLOAT Identity:0      9x37            min: 0               opt: 0               Max: 0

WARNING: Backend context bufferIdx(0) request dims:8x128x3x64 is out of range, [min: 1x64x128x3, max: 8x64x128x3]
... NvDsInferContextImpl::checkBackendParams(): backend can not support dims:128x3x64
... deserialized backend context ... failed to match config params, trying rebuild
ERROR: IBuilder::buildSerializedNetwork: API Usage Error (Dimension mismatch ... axis 1: profile 128 vs tensor 64)
Segmentation fault (core dumped)

Full `nvinferserver` (working) config

name: "nhan-dien-bien-so-xe"
platform: "tensorrt_plan"
max_batch_size: 0

input [
  {
    name: "input"
    data_type: TYPE_UINT8
    dims: [ -1, 64, 128, 3 ]   # NHWC
  }
]
output [
  {
    name: "Identity:0"
    data_type: TYPE_FP32
    dims: [ -1, 9, 37 ]        # [slots, classes]
  }
]

infer_config {
  gpu_ids: [0]
  max_batch_size: 8

  backend {
    triton {
      model_name: "nhan-dien-bien-so-ds8-rtx4000"
      version: -1
      model_repo {
        root: "/opt/lantana/lantana_data/models"
        strict_model_config: true
      }
    }
  }

  preprocess {
    network_format: IMAGE_FORMAT_RGB
    tensor_order: TENSOR_ORDER_NHWC
    maintain_aspect_ratio: 0
    frame_scaling_hw: FRAME_SCALING_HW_DEFAULT
    frame_scaling_filter: 1
    normalize {
      scale_factor: 1
    }
  }

  postprocess {
    classification {
      threshold: 0.51
      custom_parse_classifier_func: "NvDsInferClassifierParseCustomFastPlateOCR"
    }
  }

  custom_lib {
    path: "/opt/lantana/build/bin/plugins/libocr_fast_plate_parser.so"
  }
}

input_control {
  async_mode: true
}

Full `nvinfer` (problematic) config

property:
  gpu-id: 0
  # gie-unique-id: 1
  batch-size: 8

  onnx-file: "/opt/lantana/lantana_data/models/nhan-dien-bien-so-ds8-rtx4000/1/model.onnx"
  model-engine-file: "/opt/lantana/lantana_data/models/nhan-dien-bien-so-ds8-rtx4000/1/model.plan"

  # 0=FP32, 1=INT8, 2=FP16 (khớp với engine .plan)
  network-mode: 0
  # interval: 3
  network-type: 1 # 1 = Classifier, 2 = Detector, 3 = Segmenter
  # === Preprocess (tương đương preprocess trong infer_config cũ) ===
  infer-dims: "3;64;128"
  network-input-order: 1 # 0=NCHW, 1=NHWC
  output-blob-names: "Identity:0"
  net-scale-factor: 1
  model-color-format: 0 # 0=RGB 1=BGR
  maintain-aspect-ratio: 0

  # === Labels / classes ===
  labelfile-path: "/opt/lantana/lantana_data/pipeline_components/sgie_nhan-dien-bien-so-ds8-rtx4000_CLASSIFICATION/package_content/labels.txt"

  # === Classifier behavior ===
  classifier-threshold: 0.51
  classifier-async-mode: 1 # tương đương input_control.async_mode (secondary only)
  classifier-type: "lprecg_ocr"

  # === Parser YOLO custom ===
  custom-lib-path: "/opt/lantana/build/bin/plugins/libocr_fast_plate_parser.so"
  parse-classifier-func-name: "NvDsInferClassifierParseCustomFastPlateOCR"

  # =========================
  # Secondary / operate-on-* (nếu dùng làm SGIE, bỏ comment)
  # =========================
  # process-mode=2                 # 1=Primary (full-frame), 2=Secondary (objects)
  # operate-on-gie-id=2
  # operate-on-class-ids=0
  # secondary-reinfer-interval=3

Custom parser source (same `.so` for both paths)

// plugins/src/ocr_fast_plate_parser.cpp
//
// Custom parser for multi-head plate OCR (fast-plate-ocr).
// Default Plate Config: max_plate_slots = 9, alphabet:
// "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_" ('_' is pad; excluded when composing final string).
//
// With DeepStream 7 (nvinferserver):
//   postprocess { classification { custom_parse_classifier_func:
//   "NvDsInferClassifierParseCustomFastPlateOCR" } } custom_lib { path:
//   "/opt/lantana/lib/libocr_fast_plate_parser.so" }

#include <cstring>
#include <iostream>
#include <vector>
#include <string>
#include <cuda_fp16.h>
#include "nvdsinfer_custom_impl.h"

// --- alphabet/slots for fast-plate-ocr ---
static const char* kAlphabet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_";
static inline int kClasses() { return 37; }  // 36 chars + '_' (pad)
static inline int kSlots()   { return 9; }   // max_plate_slots

extern "C" bool NvDsInferClassifierParseCustomFastPlateOCR(
    const std::vector<NvDsInferLayerInfo>& outLayers, const NvDsInferNetworkInfo& /*networkInfo*/,
    float /*classifierThreshold*/, std::vector<NvDsInferAttribute>& attrList, std::string& descString) {

    if (outLayers.size() != 1) {
        std::cerr << "[OCR] Expect exactly 1 output layer (S x C)\n";
        return false;
    }
    const NvDsInferLayerInfo& L = outLayers[0];

    // Infer (S, C) from inferDims
    int d[8] = {0};
    for (unsigned i = 0; i < L.inferDims.numDims; ++i) d[i] = L.inferDims.d[i];

    int S = 0, C = 0;  // slots, classes
    if (L.inferDims.numDims == 2) {
        S = d[0]; C = d[1];
    } else if (L.inferDims.numDims == 3) {
        // assume [N, S, C] with N=1
        S = d[1]; C = d[2];
    } else if (L.inferDims.numDims == 1) {
        // flattened: [S*C]
        C = kClasses();
        int total = d[0];
        if (C > 0 && total % C == 0) S = total / C;
    }

    if (S <= 0 || C <= 0) {
        std::cerr << "[OCR] Bad output dims\n";
        return false;
    }
    if (C != kClasses()) {
        std::cerr << "[OCR] Class mismatch: model C=" << C
                  << " vs expected " << kClasses()
                  << " — update kAlphabet/kClasses if your Plate Config changed.\n";
        // continue parsing for debug visibility
    }

    // Read buffer as float with layout [S, C]
    std::vector<float> logits(static_cast<size_t>(S) * static_cast<size_t>(C));
    if (L.dataType == NvDsInferDataType::FLOAT) {
        const float* p = static_cast<const float*>(L.buffer);
        for (int i = 0, N = S * C; i < N; ++i) logits[i] = p[i];
    } else if (L.dataType == NvDsInferDataType::HALF) {
        const __half* p = static_cast<const __half*>(L.buffer);
        for (int i = 0, N = S * C; i < N; ++i) logits[i] = __half2float(p[i]);
    } else {
        std::cerr << "[OCR] Unsupported dtype (expect FP32/FP16)\n";
        return false;
    }

    // Argmax per slot; skip '_' when composing final string
    std::string plate; plate.reserve(S);
    for (int s = 0; s < S; ++s) {
        const float* row = &logits[s * C];
        int best_k = 0; float best_v = row[0];
        for (int k = 1; k < C; ++k) if (row[k] > best_v) { best_v = row[k]; best_k = k; }
        char ch = kAlphabet[best_k];
        if (ch != '_') plate.push_back(ch);
    }

    // Return a single attribute containing text (confidence=1.0 for simplicity)
    NvDsInferAttribute attr{};
    attr.attributeIndex = 0;       // "plate_text"
    attr.attributeValue = 0;
    attr.attributeConfidence = 1.0f;
    attr.attributeLabel = strdup(plate.c_str());  // freed by DS (g_free/free)
    attrList.push_back(attr);

    descString.append("[license_plate] ");
    descString.append(attr.attributeLabel);

    return true;
}

// Required so DS can verify the prototype at .so load time
CHECK_CUSTOM_CLASSIFIER_PARSE_FUNC_PROTOTYPE(NvDsInferClassifierParseCustomFastPlateOCR);

Fiona.Chen · October 9, 2025, 1:26am

Please provide the content of this label file.

yingliu · November 7, 2025, 7:17am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks.

system · November 21, 2025, 7:18am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Network input order NHWC is not handled correctly DeepStream SDK	2	270	November 8, 2023
Wrong tensor meta output from nvinferserver with triton DeepStream SDK deepstream	9	120	November 25, 2025
No detect when use nvinferserver with yolov5s DeepStream SDK yolo , inference-server-triton	5	810	September 3, 2023
Nvinfer input formats issue Jetson Nano jetson-inference	23	2624	October 12, 2021
Does Nvinferserver support custom input order? DeepStream SDK gstreamer , deepstream , deepstream61	6	612	March 6, 2023
Custom detection ONNX model gives wrong outputs using nvinfer with DeepStream 5.1 DeepStream SDK	17	3077	October 12, 2021
Can't configure DeepStream classifier to give the same softmax outputs as the TRT engine it builds DeepStream SDK deepstream , config	24	1263	January 4, 2024
How to get `nvinfer` to be as accurate as TensorRT's API? DeepStream SDK tensorrt , tensorflow , gstreamer , nvbugs , python , deepstream	25	826	November 19, 2024
Nvinfer's results are different from nvinferserver DeepStream SDK tensorrt , camera , gstreamer , nvbugs	21	1629	September 11, 2023
Utilizing Inference server for multi-batch processing with deepstream DeepStream SDK gstreamer , inference-server-triton , deepstream61	11	1371	October 19, 2023