OCRNet Limitations on Input Images

I am currently working on a project that involves utilizing OCRNet for optical character recognition tasks. Through preliminary testing with the ICDAR dataset, I have observed that OCRNet performs adequately on horizontal images, achieving an accuracy rate of approximately 75%. However, my project demands the use of predominantly vertical images as input data.

My question pertains to the adaptability of OCRNet to vertical images, specifically those with dimensions of a minimum height * width = 380*85. The default configuration of OCRNet utilizes grayscaled images with dimensions of 32 * 100 / 64 * 100.

Could you please provide insight into whether OCRNet can maintain satisfactory performance levels when presented with vertical images of the aforementioned dimensions? Additionally, any recommendations or best practices for optimizing OCRNet’s performance with vertical images would be greatly appreciated.

• Hardware (T4)
• Network Type (OCRnet)
• TLT Version
task_group: [‘model’, ‘dataset’, ‘deploy’]
format_version: 3.0
toolkit_version: 5.2.0)

Could you share an example for this kind of input data?
For height * width = 380*85 image, I am afraid it looks like several lines of characters. Did the OCDNet detect each line’s characters?

More, you can try to run inference with GitHub - NVIDIA-AI-IOT/NVIDIA-Optical-Character-Detection-and-Recognition-Solution: This repository provides optical character detection and recognition solution optimized on Nvidia devices..

Thank you for your prompt response and for your interest in my inquiry.

To clarify, when referring to images with dimensions of 380*85, I meant that the text would be arranged in a single column, similar to a word printed vertically. I apologize for any confusion regarding the description of the input data.

Although I haven’t run these cropped images on OCRNet for detection or tested them on the OCRNet for character recognition yet, my primary focus is to ensure that OCRNet can effectively read characters from vertically arranged text. This verification step is crucial before preparing the dataset for further analysis.

I will try to attach a sample image of the vertically arranged text for your reference. Your insights and guidance on whether OCRNet can handle such input data effectively would be immensely helpful.


So, your test images will be an cropped image similar to
, right?

The images tested will be the same with the training images. The properties of these images either differ in resolution, brightness, saturation and angle and background colour and font colour however clear with disparity. Otherwise, similar pixels , character length and character type.

Thanks for the info. Currently, TAO OCDNet can detect each line of character and then TAO OCRNet can recognize the characters. It does not support recognizing the whole vertical characters yet.

Thank you for providing this information. It’s helpful to know that TAO OCDNet can detect each line of characters, and subsequently, TAO OCRNet can recognize the characters individually. I understand that OCRNet does not currently support recognizing whole vertical characters. In that case, I will consider alternative approaches for handling vertically arranged text within my project.

Your clarification is greatly appreciated. If there are any updates or developments regarding OCRNet’s capability to recognize vertical characters in the future, I would be interested to learn about them.

Thank you once again for your assistance.

Thanks a lot for raising this interesting feature.

After discussing internally, for the original vertical dataset, you can rotate them 90 degree. Then use TAO OCDR solution(contains rectifier) to run. Refer to GitHub - NVIDIA-AI-IOT/NVIDIA-Optical-Character-Detection-and-Recognition-Solution: This repository provides optical character detection and recognition solution optimized on Nvidia devices..
More, if possible, could you share some data as well?

Thank you for the suggestion provided regarding the handling of vertical datasets using TAO OCDR solution. I appreciate your team’s input on this matter. I will certainly explore the option of rotating the original vertical dataset by 90 degrees and utilizing the TAO OCDR solution.

Regarding sharing data, I’m preparing a sample dataset that aligns with the requirements in the document. Once available, Ill feedback it on the feed :)

Here are some examples of the images for training

I want to inquire whether there are any specific specifications regarding the image size for training purposes and would training different image heights/widths as such affect the accuracy of the model. Additionally, I would like to confirm whether the training images, as provided above, would suffice for training purposes, particularly if I rotate the vertical ones as suggested.

There are not specific limitation for the images size when train OCDNet. For OCRNet, the input_width is expected to be > 4. The input_height is expected to be > 32. Refer to OCRNet - NVIDIA Docs.
For dataset, above 5 images are not enough. You can generate more dataset, particularly rotate the vertical ones.

Thank you for the clarification. Apologies for the confusion, i meant to ask will the sample images ( as an example) be ready for training. For instance, do i have to make sure they are all the same size or grayscaled. Do you mind advising the recommended dataset number :)

Please make sure

  • horizontal images
  • and also one line of characters. For example, the first image, please trim it to two images. One image is for the 1st line. Another image is for the 2nd line.
  • Not needed to be the same size or grayscaled.

You can also refer to the training spec file and notebook in
tao_tutorials/notebooks/tao_launcher_starter_kit/ocrnet at main · NVIDIA/tao_tutorials · GitHub. There are Vit version and non-vit version. Also, public dataset is also mentioned.

Expect at least several hundreds of training images. More is better.

1 Like

Thank you for the detailed guide! Will feedback of the progress.