Excited with the new LPRnet model card which would be very useful for real world applications. However, it is stated that the model files are trained on US and Chinease LPs which consists of single line of text. As per the documentation, “the LPRNet model produces a sequence of class IDs. The image feature is divided into slices along the horizontal dimension and each slice is assigned a character ID in the prediction.”
I wonder if the LPRnet is trainable to read plates of countries which contain two lines of texts, for example -
As those plates contains two different lines of text, the image feature should be divided into slices in both horizontal and vertical dimensions and each slice should be assigned a character ID in the prediction. Could anyone enlighten me if it would be sane to retrain the LPRnet for such case? IF so, what type of stride should be in the Resnet architecture?
One extra question, could you please recommend a public dataset containing two separate lines? If possible, could you share some samples for this kind of license plates?
I have more than 1,00,000 image and label data for such plate images.
As the license plate contains characters in 2 rows, I wonder if its possible to customize the LPRnet to scan the two rows simultaneously instead of one.
As the ResNet 10/18 is the backbone of LPRNet, although the original stride of the ResNet network is 32 but to make it more applicable to the small spatial size of the license plate image, we have to tune the stride from 32 to 4.
Can I tune the stride to some other values to cover the two rows? The I can customize the nvinfer_custom_lpr_parser.cpp to cover the stride. seq_len = networkInfo.width/4;
This should be not working. Actually we are lacking of the two-line license plates dataset for further research. Are the 1,00,000 image public? If not, is it possible for NV to pay for them?
Thanks for the suggestion @Morganh ! I have thought that way earlier, using 2 LPRnet as 2nd sgie working on LPDnet SGIE. But I couldn’t find the way how to set the LPDnet SGIE to crop the upper part or the lower part (treating the other part as background as you mentioned).
As mentioned above, just split the original labels into two parts. The first part is a folder which have label files. Each file only contains the 1st line. The second part is also a folder which have label files. Each file only contains the 2nd line.
Then train 2 lprnet models separately.
is the pipeline.
Here the SGIE_2 takes the whole License Plate image from the SGIE_1. How can I tune the SGIE_1 to emit half of the license plate image? Or how is it possible to ingest/infer half of the license plate image on SGIE_2 from the whole image emitted by SGIE_1? Which config param need to be changed in the SGIE_2 config file?
It is not needed to tune the SGIE_1. Just let the SGIE_2 run, and it will output the inference result of 1st line. At the same time, let the SGIE_3 run, and it will output the inference result of 2nd line.