Training OCRNet for being used for LPD/LPR

dGPU
DS 7

As follow up of the long thread here Tao toolkit observations - #63 by foreverneilyoung
I was following the notebook ocrnet/ocrnet-vit.ipynb in order to train OCRNet for German number plate recognition.

I first ran the notebook “as is” to see, what it gives. In the end I got this:

.
├── best_accuracy.onnx
├── status.json
└── trt.engine

I was using this ONNX model as replacement for my original LPR ONNX, trained this morning from lprnet/lprnet.ipynb with the following configuration:

[property]
gpu-id=0
# This model works. Trained from LPRNet
#onnx-file=models/LP/LPR/lprnet_epoch-024.onnx
onnx-file=models/LP/LPR/best_accuracy.onnx
labelfile-path=models/LP/LPR/labels_us.txt
batch-size=16
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
gie-unique-id=3
# This line is causing problems
output-blob-names=output_bbox/BiasAdd;output_cov/Sigmoid
#0=Detection 1=Classifier 2=Segmentation
network-type=1
parse-classifier-func-name=NvDsInferParseCustomNVPlate
custom-lib-path=nvinfer/libnvdsinfer_custom_impl_lpr.so
process-mode=2
operate-on-gie-id=2
net-scale-factor=0.00392156862745098
#net-scale-factor=1.0
#0=RGB 1=BGR 2=GRAY
model-color-format=0

[class-attrs-all]
threshold=0.5

But all I got was this:

0:00:02.233559654 24474 0x7719e4009c70 INFO nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<sgie2-lpr> NvDsInferContext[UID 3]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2109> [UID = 3]: Trying to create engine from model files
WARNING: [TRT]: onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
WARNING: [TRT]: onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
WARNING: [TRT]: Tensor DataType is determined at build time for tensors not marked as input or output.
WARNING: [TRT]: Tensor DataType is determined at build time for tensors not marked as input or output
WARNING: [TRT]: Tensor DataType is determined at build time for tensors not marked as input or output.
WARNING: [TRT]: Tensor DataType is determined at build time for tensors not marked as input or output.
WARNING: [TRT]: Tensor DataType is determined at build time for tensors not marked as input or output.
WARNING: [TRT]: Tensor DataType is determined at build time for tensors not marked as input or output.
// and lot more of these warnings

Any idea what’s going wrong?

Disregard. It just took very long to build the engine.

However, the error is now

0:00:10.346122630 24982      0x13cc760 ERROR                nvinfer gstnvinfer.cpp:676:gst_nvinfer_logger:<sgie2-lpr> NvDsInferContext[UID 3]: Error in NvDsInferContextImpl::preparePreprocess() <nvdsinfer_context_impl.cpp:1035> [UID = 3]: RGB/BGR input format specified but network input channels is not 3

Input is an RTSP stream, working fine since weeks for other models

Hi @foreverneilyoung ,
As we synced in Tao toolkit observations - #61 by Morganh, you are using https://github.com/NVIDIA-AI-IOT/deepstream_lpr_app/blob/master/deepstream-lpr-app/lpr_config_pgie.txt but obviously there is problem in output-blob-names.

I move this topic to deepstream forum to check further.

Since it is an ONNX model, please get the input layer dimensions and names by netron.

@Fiona.Chen I already consulted netron, it gave an image, which was too big for PNG export. SVG export even too big to be attachable here. Top of it looks like so:

Could you please provide a pointer what you mean by “please get the input layer dimensions and names” and what to do with this?

DeepStream does not care about the details of the network, only the input and output are meaningful for DeepStream. Please click the top layer of the network, the input and output layers’ info will appear in right side.

Click on input gives this:

Since the model is trained by you, please confirm that the model accepts the gray image with 200x64 resolution as input and the input tensor dimension is NCHW.

The preprocessing algorithm is the training script, please guarantee the preprocessing parameters are aligned with your training script. DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

Well, I guess, you can tell this way better than me. The model is the result of a run of this unchanged notebook

If you don’t understand the training script, please consult in the TAO forum.

Oh thanks, perfect circle

@Morganh

Can you tell this user about the preprocessing parameters for Optical Character Recognition | NVIDIA NGC? Seems he is working on trainable_v2.x version

@foreverneilyoung

Please set “model-color-format=2” in the nvinfer configuration file for the “the gray image with 200x64 resolution as input and the input tensor dimension is NCHW” ONNX model.

And please customize your own postprocessing “NvDsInferParseCustomNVPlate” function deepstream_lpr_app/nvinfer_custom_lpr_parser/nvinfer_custom_lpr_parser.cpp at master · NVIDIA-AI-IOT/deepstream_lpr_app (github.com) to process the OCRNet’s output tensors.

Tried that. Gives segmentation fault

EDIT: I meant “model-color-format=2” gives that

What am I supposed to customize here?

Maybe one step back in order to provide the current state:

  • I’m having a running LPD/LPR setup using LPRNet with this configuration:
[property]
gpu-id=0

onnx-file=models/LP/LPR/lprnet_epoch-024.onnx
model-engine-file=models/LP/LPR/lprnet_epoch-024.onnx_b16_gpu0_fp16.engine

#onnx-file=models/LP/LPR/best_accuracy.onnx
#model-engine-file=models/LP/LPR/best_accuracy.onnx_b16_gpu0_fp16.engine

labelfile-path=models/LP/LPR/labels_us.txt
batch-size=16
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
gie-unique-id=3
#output-blob-names=output_bbox/BiasAdd;output_cov/Sigmoid
#0=Detection 1=Classifier 2=Segmentation
network-type=1
parse-classifier-func-name=NvDsInferParseCustomNVPlate
custom-lib-path=nvinfer/libnvdsinfer_custom_impl_lpr.so
process-mode=2
operate-on-gie-id=2
net-scale-factor=0.00392156862745098
#net-scale-factor=1.0
#0=RGB 1=BGR 2=GRAY
model-color-format=0

[class-attrs-all]
threshold=0.5
  • The model used is the result of a training run using this notebook tao_tutorials/notebooks/tao_launcher_starter_kit/lprnet/lprnet.ipynb at main · NVIDIA/tao_tutorials · GitHub

  • Unfortunately this model is unable to detect “spaces” or other delimiters (which are important for number plates in other parts of the world, other than China or USA)

  • I tried to train LPRNet with a characterset containing spaces and minus sign to no avail

  • I got a hint to try OCRNet. So I ran the a.m. notebook for OCRNet training unchanged and got that “best_accuracy.onnx” network

  • I just replaced the LPR network by the OCR network and this doesn’t work (just commenting the LPR model lines above and uncommenting the others)

  • It fails with the a.m. error regarding colour channels. Setting colour mode to GRAY is accepted initially, but crashes at runtime

Sure it will fail. The postprocessing is for License Plate Recognition | NVIDIA NGC but not for Optical Character Recognition | NVIDIA NGC.

And you still don’t get the correct preprocessing parameters yet.

You need to consult the model engineer for the postprocessing algorithm of the model you want to use.

And this is who? Mr Who?

Thanks, but this is becoming funny now. No further questions

Actually this is a new feature regarding to LPDNet + OCRNet in deepstream.
For OCDNet + OCRNet in deepstream, we have deepstream_tao_apps/apps/tao_others/deepstream-nvocdr-app at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub.
For LPDNet + LPRNet in deepstream, we have deepstream_lpr_app/deepstream-lpr-app at master · NVIDIA-AI-IOT/deepstream_lpr_app · GitHub.

Ah thanks, this looks helpful