Title
nvinfer yields constant OCR text with NHWC engine (fast_plate_ocr – cct_s_v1_global_model) while nvinferserver returns correct results
Environment
-
SDK: NVIDIA DeepStream 7.x
-
Pipelines:
-
✅
nvinferserver(Triton) → OCR output is correct -
❌
nvinfer(SGIE) → OCR output collapses to a constant/few fixed characters
-
-
Model:
fast_plate_ocr→cct_s_v1_global_model-
Input:
[N, 64, 128, 3](NHWC,UINT8) -
Output:
[N, 9, 37](multi-head classification; alphabet0-9A-Zplus_for pad)
-
-
Use case: SGIE operating on cropped license plate ROIs
DeepStream docs note nvinfer performs internal format conversion/scaling and feeds planar data to TensorRT (RGB/BGR/GRAY with network H×W), which can be a source of layout mismatches vs. NHWC engines if not handled carefully. (NVIDIA Docs)
Summary
Using the same ONNX/plan and the same custom classifier parser:
-
nvinferserver(Triton) with NHWC config produces correct plate strings. (NVIDIA Docs) -
nvinfer(SGIE) only runs when set to:infer-dims=3;64;128 network-input-order=1 # NHWCbut the decoded plate becomes a constant/incorrect string across frames.
-
Switching to the “intuitive” H;W;C form (
64;128;3) withnetwork-input-order=1triggers TensorRT profile/dimension mismatches and rebuild attempts (example log below).
This suggests a layout/preprocess inconsistency in nvinfer with NHWC engines, whereas Triton’s path behaves as expected.
Expected vs. Actual
-
Expected:
nvinferwith an NHWC engine and matching config should decode identical OCR tonvinferserver. -
Actual:
nvinfereither (a) runs but returns a constant/incorrect plate string, or (b) fails with TensorRT profile/dimension mismatch errors when using H;W;C dims.
Repro Steps
-
Build
fast_plate_ocrcct_s_v1_global_modelwith NHWC input[1,64,128,3]and output[1,9,37]. -
Run as SGIE via
nvinferserver(config below) → correct OCR. (NVIDIA Docs) -
Switch to
nvinfer(config below), setnetwork-input-order=1to match NHWC.-
With
infer-dims=3;64;128, pipeline runs but OCR collapses to a constant string. -
With
infer-dims=64;128;3,nvinferattempts to rebuild and fails with TRT dimension/profile mismatches.
-
Logs (representative)
[FullDims Engine Info]:
0 INPUT kUINT8 input 64x128x3 min: 1x64x128x3 opt: 8x64x128x3 Max: 8x64x128x3
1 OUTPUT kFLOAT Identity:0 9x37 min: 0 opt: 0 Max: 0
WARNING: Backend context bufferIdx(0) request dims:8x128x3x64 is out of range, [min: 1x64x128x3, max: 8x64x128x3]
... NvDsInferContextImpl::checkBackendParams(): backend can not support dims:128x3x64
... deserialized backend context ... failed to match config params, trying rebuild
ERROR: IBuilder::buildSerializedNetwork: API Usage Error (Dimension mismatch ... axis 1: profile 128 vs tensor 64)
Segmentation fault (core dumped)
Full nvinferserver (working) config
name: "nhan-dien-bien-so-xe"
platform: "tensorrt_plan"
max_batch_size: 0
input [
{
name: "input"
data_type: TYPE_UINT8
dims: [ -1, 64, 128, 3 ] # NHWC
}
]
output [
{
name: "Identity:0"
data_type: TYPE_FP32
dims: [ -1, 9, 37 ] # [slots, classes]
}
]
infer_config {
gpu_ids: [0]
max_batch_size: 8
backend {
triton {
model_name: "nhan-dien-bien-so-ds8-rtx4000"
version: -1
model_repo {
root: "/opt/lantana/lantana_data/models"
strict_model_config: true
}
}
}
preprocess {
network_format: IMAGE_FORMAT_RGB
tensor_order: TENSOR_ORDER_NHWC
maintain_aspect_ratio: 0
frame_scaling_hw: FRAME_SCALING_HW_DEFAULT
frame_scaling_filter: 1
normalize {
scale_factor: 1
}
}
postprocess {
classification {
threshold: 0.51
custom_parse_classifier_func: "NvDsInferClassifierParseCustomFastPlateOCR"
}
}
custom_lib {
path: "/opt/lantana/build/bin/plugins/libocr_fast_plate_parser.so"
}
}
input_control {
async_mode: true
}
Full nvinfer (problematic) config
property:
gpu-id: 0
# gie-unique-id: 1
batch-size: 8
onnx-file: "/opt/lantana/lantana_data/models/nhan-dien-bien-so-ds8-rtx4000/1/model.onnx"
model-engine-file: "/opt/lantana/lantana_data/models/nhan-dien-bien-so-ds8-rtx4000/1/model.plan"
# 0=FP32, 1=INT8, 2=FP16 (khớp với engine .plan)
network-mode: 0
# interval: 3
network-type: 1 # 1 = Classifier, 2 = Detector, 3 = Segmenter
# === Preprocess (tương đương preprocess trong infer_config cũ) ===
infer-dims: "3;64;128"
network-input-order: 1 # 0=NCHW, 1=NHWC
output-blob-names: "Identity:0"
net-scale-factor: 1
model-color-format: 0 # 0=RGB 1=BGR
maintain-aspect-ratio: 0
# === Labels / classes ===
labelfile-path: "/opt/lantana/lantana_data/pipeline_components/sgie_nhan-dien-bien-so-ds8-rtx4000_CLASSIFICATION/package_content/labels.txt"
# === Classifier behavior ===
classifier-threshold: 0.51
classifier-async-mode: 1 # tương đương input_control.async_mode (secondary only)
classifier-type: "lprecg_ocr"
# === Parser YOLO custom ===
custom-lib-path: "/opt/lantana/build/bin/plugins/libocr_fast_plate_parser.so"
parse-classifier-func-name: "NvDsInferClassifierParseCustomFastPlateOCR"
# =========================
# Secondary / operate-on-* (nếu dùng làm SGIE, bỏ comment)
# =========================
# process-mode=2 # 1=Primary (full-frame), 2=Secondary (objects)
# operate-on-gie-id=2
# operate-on-class-ids=0
# secondary-reinfer-interval=3
Custom parser source (same .so for both paths)
// plugins/src/ocr_fast_plate_parser.cpp
//
// Custom parser for multi-head plate OCR (fast-plate-ocr).
// Default Plate Config: max_plate_slots = 9, alphabet:
// "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_" ('_' is pad; excluded when composing final string).
//
// With DeepStream 7 (nvinferserver):
// postprocess { classification { custom_parse_classifier_func:
// "NvDsInferClassifierParseCustomFastPlateOCR" } } custom_lib { path:
// "/opt/lantana/lib/libocr_fast_plate_parser.so" }
#include <cstring>
#include <iostream>
#include <vector>
#include <string>
#include <cuda_fp16.h>
#include "nvdsinfer_custom_impl.h"
// --- alphabet/slots for fast-plate-ocr ---
static const char* kAlphabet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_";
static inline int kClasses() { return 37; } // 36 chars + '_' (pad)
static inline int kSlots() { return 9; } // max_plate_slots
extern "C" bool NvDsInferClassifierParseCustomFastPlateOCR(
const std::vector<NvDsInferLayerInfo>& outLayers, const NvDsInferNetworkInfo& /*networkInfo*/,
float /*classifierThreshold*/, std::vector<NvDsInferAttribute>& attrList, std::string& descString) {
if (outLayers.size() != 1) {
std::cerr << "[OCR] Expect exactly 1 output layer (S x C)\n";
return false;
}
const NvDsInferLayerInfo& L = outLayers[0];
// Infer (S, C) from inferDims
int d[8] = {0};
for (unsigned i = 0; i < L.inferDims.numDims; ++i) d[i] = L.inferDims.d[i];
int S = 0, C = 0; // slots, classes
if (L.inferDims.numDims == 2) {
S = d[0]; C = d[1];
} else if (L.inferDims.numDims == 3) {
// assume [N, S, C] with N=1
S = d[1]; C = d[2];
} else if (L.inferDims.numDims == 1) {
// flattened: [S*C]
C = kClasses();
int total = d[0];
if (C > 0 && total % C == 0) S = total / C;
}
if (S <= 0 || C <= 0) {
std::cerr << "[OCR] Bad output dims\n";
return false;
}
if (C != kClasses()) {
std::cerr << "[OCR] Class mismatch: model C=" << C
<< " vs expected " << kClasses()
<< " — update kAlphabet/kClasses if your Plate Config changed.\n";
// continue parsing for debug visibility
}
// Read buffer as float with layout [S, C]
std::vector<float> logits(static_cast<size_t>(S) * static_cast<size_t>(C));
if (L.dataType == NvDsInferDataType::FLOAT) {
const float* p = static_cast<const float*>(L.buffer);
for (int i = 0, N = S * C; i < N; ++i) logits[i] = p[i];
} else if (L.dataType == NvDsInferDataType::HALF) {
const __half* p = static_cast<const __half*>(L.buffer);
for (int i = 0, N = S * C; i < N; ++i) logits[i] = __half2float(p[i]);
} else {
std::cerr << "[OCR] Unsupported dtype (expect FP32/FP16)\n";
return false;
}
// Argmax per slot; skip '_' when composing final string
std::string plate; plate.reserve(S);
for (int s = 0; s < S; ++s) {
const float* row = &logits[s * C];
int best_k = 0; float best_v = row[0];
for (int k = 1; k < C; ++k) if (row[k] > best_v) { best_v = row[k]; best_k = k; }
char ch = kAlphabet[best_k];
if (ch != '_') plate.push_back(ch);
}
// Return a single attribute containing text (confidence=1.0 for simplicity)
NvDsInferAttribute attr{};
attr.attributeIndex = 0; // "plate_text"
attr.attributeValue = 0;
attr.attributeConfidence = 1.0f;
attr.attributeLabel = strdup(plate.c_str()); // freed by DS (g_free/free)
attrList.push_back(attr);
descString.append("[license_plate] ");
descString.append(attr.attributeLabel);
return true;
}
// Required so DS can verify the prototype at .so load time
CHECK_CUSTOM_CLASSIFIER_PARSE_FUNC_PROTOTYPE(NvDsInferClassifierParseCustomFastPlateOCR);