Nvinfer yields constant OCR text with NHWC engine (fast_plate_ocr – cct_s_v1_global_model) while nvinferserver returns correct results

Title

nvinfer yields constant OCR text with NHWC engine (fast_plate_ocrcct_s_v1_global_model) while nvinferserver returns correct results

Environment

  • SDK: NVIDIA DeepStream 7.x

  • Pipelines:

    • nvinferserver (Triton) → OCR output is correct

    • nvinfer (SGIE) → OCR output collapses to a constant/few fixed characters

  • Model: fast_plate_ocrcct_s_v1_global_model

    • Input: [N, 64, 128, 3] (NHWC, UINT8)

    • Output: [N, 9, 37] (multi-head classification; alphabet 0-9A-Z plus _ for pad)

  • Use case: SGIE operating on cropped license plate ROIs

DeepStream docs note nvinfer performs internal format conversion/scaling and feeds planar data to TensorRT (RGB/BGR/GRAY with network H×W), which can be a source of layout mismatches vs. NHWC engines if not handled carefully. (NVIDIA Docs)


Summary

Using the same ONNX/plan and the same custom classifier parser:

  • nvinferserver (Triton) with NHWC config produces correct plate strings. (NVIDIA Docs)

  • nvinfer (SGIE) only runs when set to:

    infer-dims=3;64;128
    network-input-order=1  # NHWC
    
    

    but the decoded plate becomes a constant/incorrect string across frames.

  • Switching to the “intuitive” H;W;C form (64;128;3) with network-input-order=1 triggers TensorRT profile/dimension mismatches and rebuild attempts (example log below).

This suggests a layout/preprocess inconsistency in nvinfer with NHWC engines, whereas Triton’s path behaves as expected.


Expected vs. Actual

  • Expected: nvinfer with an NHWC engine and matching config should decode identical OCR to nvinferserver.

  • Actual: nvinfer either (a) runs but returns a constant/incorrect plate string, or (b) fails with TensorRT profile/dimension mismatch errors when using H;W;C dims.


Repro Steps

  1. Build fast_plate_ocr cct_s_v1_global_model with NHWC input [1,64,128,3] and output [1,9,37].

  2. Run as SGIE via nvinferserver (config below) → correct OCR. (NVIDIA Docs)

  3. Switch to nvinfer (config below), set network-input-order=1 to match NHWC.

    • With infer-dims=3;64;128, pipeline runs but OCR collapses to a constant string.

    • With infer-dims=64;128;3, nvinfer attempts to rebuild and fails with TRT dimension/profile mismatches.


Logs (representative)

[FullDims Engine Info]:
0   INPUT  kUINT8 input           64x128x3        min: 1x64x128x3      opt: 8x64x128x3      Max: 8x64x128x3
1   OUTPUT kFLOAT Identity:0      9x37            min: 0               opt: 0               Max: 0

WARNING: Backend context bufferIdx(0) request dims:8x128x3x64 is out of range, [min: 1x64x128x3, max: 8x64x128x3]
... NvDsInferContextImpl::checkBackendParams(): backend can not support dims:128x3x64
... deserialized backend context ... failed to match config params, trying rebuild
ERROR: IBuilder::buildSerializedNetwork: API Usage Error (Dimension mismatch ... axis 1: profile 128 vs tensor 64)
Segmentation fault (core dumped)


Full nvinferserver (working) config

name: "nhan-dien-bien-so-xe"
platform: "tensorrt_plan"
max_batch_size: 0

input [
  {
    name: "input"
    data_type: TYPE_UINT8
    dims: [ -1, 64, 128, 3 ]   # NHWC
  }
]
output [
  {
    name: "Identity:0"
    data_type: TYPE_FP32
    dims: [ -1, 9, 37 ]        # [slots, classes]
  }
]

infer_config {
  gpu_ids: [0]
  max_batch_size: 8

  backend {
    triton {
      model_name: "nhan-dien-bien-so-ds8-rtx4000"
      version: -1
      model_repo {
        root: "/opt/lantana/lantana_data/models"
        strict_model_config: true
      }
    }
  }

  preprocess {
    network_format: IMAGE_FORMAT_RGB
    tensor_order: TENSOR_ORDER_NHWC
    maintain_aspect_ratio: 0
    frame_scaling_hw: FRAME_SCALING_HW_DEFAULT
    frame_scaling_filter: 1
    normalize {
      scale_factor: 1
    }
  }

  postprocess {
    classification {
      threshold: 0.51
      custom_parse_classifier_func: "NvDsInferClassifierParseCustomFastPlateOCR"
    }
  }

  custom_lib {
    path: "/opt/lantana/build/bin/plugins/libocr_fast_plate_parser.so"
  }
}

input_control {
  async_mode: true
}


Full nvinfer (problematic) config

property:
  gpu-id: 0
  # gie-unique-id: 1
  batch-size: 8

  onnx-file: "/opt/lantana/lantana_data/models/nhan-dien-bien-so-ds8-rtx4000/1/model.onnx"
  model-engine-file: "/opt/lantana/lantana_data/models/nhan-dien-bien-so-ds8-rtx4000/1/model.plan"

  # 0=FP32, 1=INT8, 2=FP16 (khớp với engine .plan)
  network-mode: 0
  # interval: 3
  network-type: 1 # 1 = Classifier, 2 = Detector, 3 = Segmenter
  # === Preprocess (tương đương preprocess trong infer_config cũ) ===
  infer-dims: "3;64;128"
  network-input-order: 1 # 0=NCHW, 1=NHWC
  output-blob-names: "Identity:0"
  net-scale-factor: 1
  model-color-format: 0 # 0=RGB 1=BGR
  maintain-aspect-ratio: 0

  # === Labels / classes ===
  labelfile-path: "/opt/lantana/lantana_data/pipeline_components/sgie_nhan-dien-bien-so-ds8-rtx4000_CLASSIFICATION/package_content/labels.txt"

  # === Classifier behavior ===
  classifier-threshold: 0.51
  classifier-async-mode: 1 # tương đương input_control.async_mode (secondary only)
  classifier-type: "lprecg_ocr"

  # === Parser YOLO custom ===
  custom-lib-path: "/opt/lantana/build/bin/plugins/libocr_fast_plate_parser.so"
  parse-classifier-func-name: "NvDsInferClassifierParseCustomFastPlateOCR"

  # =========================
  # Secondary / operate-on-* (nếu dùng làm SGIE, bỏ comment)
  # =========================
  # process-mode=2                 # 1=Primary (full-frame), 2=Secondary (objects)
  # operate-on-gie-id=2
  # operate-on-class-ids=0
  # secondary-reinfer-interval=3


Custom parser source (same .so for both paths)

// plugins/src/ocr_fast_plate_parser.cpp
//
// Custom parser for multi-head plate OCR (fast-plate-ocr).
// Default Plate Config: max_plate_slots = 9, alphabet:
// "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_" ('_' is pad; excluded when composing final string).
//
// With DeepStream 7 (nvinferserver):
//   postprocess { classification { custom_parse_classifier_func:
//   "NvDsInferClassifierParseCustomFastPlateOCR" } } custom_lib { path:
//   "/opt/lantana/lib/libocr_fast_plate_parser.so" }

#include <cstring>
#include <iostream>
#include <vector>
#include <string>
#include <cuda_fp16.h>
#include "nvdsinfer_custom_impl.h"

// --- alphabet/slots for fast-plate-ocr ---
static const char* kAlphabet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_";
static inline int kClasses() { return 37; }  // 36 chars + '_' (pad)
static inline int kSlots()   { return 9; }   // max_plate_slots

extern "C" bool NvDsInferClassifierParseCustomFastPlateOCR(
    const std::vector<NvDsInferLayerInfo>& outLayers, const NvDsInferNetworkInfo& /*networkInfo*/,
    float /*classifierThreshold*/, std::vector<NvDsInferAttribute>& attrList, std::string& descString) {

    if (outLayers.size() != 1) {
        std::cerr << "[OCR] Expect exactly 1 output layer (S x C)\n";
        return false;
    }
    const NvDsInferLayerInfo& L = outLayers[0];

    // Infer (S, C) from inferDims
    int d[8] = {0};
    for (unsigned i = 0; i < L.inferDims.numDims; ++i) d[i] = L.inferDims.d[i];

    int S = 0, C = 0;  // slots, classes
    if (L.inferDims.numDims == 2) {
        S = d[0]; C = d[1];
    } else if (L.inferDims.numDims == 3) {
        // assume [N, S, C] with N=1
        S = d[1]; C = d[2];
    } else if (L.inferDims.numDims == 1) {
        // flattened: [S*C]
        C = kClasses();
        int total = d[0];
        if (C > 0 && total % C == 0) S = total / C;
    }

    if (S <= 0 || C <= 0) {
        std::cerr << "[OCR] Bad output dims\n";
        return false;
    }
    if (C != kClasses()) {
        std::cerr << "[OCR] Class mismatch: model C=" << C
                  << " vs expected " << kClasses()
                  << " — update kAlphabet/kClasses if your Plate Config changed.\n";
        // continue parsing for debug visibility
    }

    // Read buffer as float with layout [S, C]
    std::vector<float> logits(static_cast<size_t>(S) * static_cast<size_t>(C));
    if (L.dataType == NvDsInferDataType::FLOAT) {
        const float* p = static_cast<const float*>(L.buffer);
        for (int i = 0, N = S * C; i < N; ++i) logits[i] = p[i];
    } else if (L.dataType == NvDsInferDataType::HALF) {
        const __half* p = static_cast<const __half*>(L.buffer);
        for (int i = 0, N = S * C; i < N; ++i) logits[i] = __half2float(p[i]);
    } else {
        std::cerr << "[OCR] Unsupported dtype (expect FP32/FP16)\n";
        return false;
    }

    // Argmax per slot; skip '_' when composing final string
    std::string plate; plate.reserve(S);
    for (int s = 0; s < S; ++s) {
        const float* row = &logits[s * C];
        int best_k = 0; float best_v = row[0];
        for (int k = 1; k < C; ++k) if (row[k] > best_v) { best_v = row[k]; best_k = k; }
        char ch = kAlphabet[best_k];
        if (ch != '_') plate.push_back(ch);
    }

    // Return a single attribute containing text (confidence=1.0 for simplicity)
    NvDsInferAttribute attr{};
    attr.attributeIndex = 0;       // "plate_text"
    attr.attributeValue = 0;
    attr.attributeConfidence = 1.0f;
    attr.attributeLabel = strdup(plate.c_str());  // freed by DS (g_free/free)
    attrList.push_back(attr);

    descString.append("[license_plate] ");
    descString.append(attr.attributeLabel);

    return true;
}

// Required so DS can verify the prototype at .so load time
CHECK_CUSTOM_CLASSIFIER_PARSE_FUNC_PROTOTYPE(NvDsInferClassifierParseCustomFastPlateOCR);


Please provide the content of this label file.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks.