Excited with the new LPRnet model card which would be very useful for real world applications. However, it is stated that the model files are trained on US and Chinease LPs which consists of single line of text. As per the documentation, “the LPRNet model produces a sequence of class IDs. The image feature is divided into slices along the horizontal dimension and each slice is assigned a character ID in the prediction.”
I wonder if the LPRnet is trainable to read plates of countries which contain two lines of texts, for example -
Bangladeshi Vehicle Registration Plate
As those plates contains two different lines of text, the image feature should be divided into slices in both horizontal and vertical dimensions and each slice should be assigned a character ID in the prediction. Could anyone enlighten me if it would be sane to retrain the LPRnet for such case? IF so, what type of stride should be in the Resnet architecture?
Currently, LPRnet does not support two lines of text yet.
One extra question, could you please recommend a public dataset containing two separate lines? If possible, could you share some samples for this kind of license plates?
@Morganh Here is a sample of such license plate
I have more than 1,00,000 image and label data for such plate images.
As the license plate contains characters in 2 rows, I wonder if its possible to customize the LPRnet to scan the two rows simultaneously instead of one.
As the ResNet 10/18 is the backbone of LPRNet, although the original stride of the ResNet network is 32 but to make it more applicable to the small spatial size of the license plate image, we have to tune the stride from 32 to 4.
Can I tune the stride to some other values to cover the two rows? The I can customize the nvinfer_custom_lpr_parser.cpp to cover the stride.
seq_len = networkInfo.width/4;
Can you please enlighten if its the correct way?
This should be not working. Actually we are lacking of the two-line license plates dataset for further research. Are the 1,00,000 image public? If not, is it possible for NV to pay for them?
The dataset is private. I can provide the dataset privately under NDA
Thanks for the info. I will sync with TLT internal team for this.
My company is a NVIDIA Inception Program member. Would be happy to collaborate.
Thanks for the opportunity. Internal team are actively working on the process.
We have plan for two lines of LPR, but unfortunately not for now.
Below is a workaround for you to go.
Train 2 LPRnet,
- Split the labels into two files. One contains all the 1st line. Another contains all the 2nd line.
- The first lprnet will only recognize the first row of characters, while treating 2nd line as background, meaning ignore all characters of 2nd row
- The second lprnet will only recognize the second row of characters, while treating 1st line as background, meaning ignore all characters of 1st row
- Then train 2 lprnets and run
Thanks for the suggestion @Morganh ! I have thought that way earlier, using 2 LPRnet as 2nd sgie working on LPDnet SGIE. But I couldn’t find the way how to set the LPDnet SGIE to crop the upper part or the lower part (treating the other part as background as you mentioned).
As mentioned above, just split the original labels into two parts. The first part is a folder which have label files. Each file only contains the 1st line. The second part is also a folder which have label files. Each file only contains the 2nd line.
Then train 2 lprnet models separately.
I understood it. But I was talking about the inferencing part, where
PGIE --> SGIE_1 (LPDnet)|--> SGIE_2 (LPRnet_1)
|--> SGIE_3 (LPRnet_2)
is the pipeline.
SGIE_2 takes the whole License Plate image from the
SGIE_1. How can I tune the
SGIE_1 to emit half of the license plate image? Or how is it possible to ingest/infer half of the license plate image on
SGIE_2 from the whole image emitted by
SGIE_1? Which config param need to be changed in the
SGIE_2 config file?
It is not needed to tune the SGIE_1. Just let the SGIE_2 run, and it will output the inference result of 1st line. At the same time, let the SGIE_3 run, and it will output the inference result of 2nd line.
Well, from the full license plate image provided by SGIE_1, how does the
SGIE_3 know to read the 2nd line, not the 1st line? I am confused.
Since you already train SGIE_3 model with the images and their 2nd line groundtruth.
The model has learned it.
Okey understood. I’ll try training and inferencing and will share the result. Thanks a lot.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.