Queries Regarding OCR

Hi Everyone

I am Newbee to this environment .I want to train and test a model using nvidia deepstream for ocr (Seven Segment Values Recognition in Digital Metres) .But its hard to find a way to know the correct path for me .I want to train the model in Google colab and want to infer in windows platform beacuse GPU is not avialable .Also i want some clarity also regarding select the model and steps to train that model in Google colab.Could any one guide me the right path

Thank You


i found https://developer.nvidia.com/blog/create-custom-character-detection-and-recognition-models-with-nvidia-tao-part-1/
this one helpful for me but i little bit confused about the structure of dataset .
In orignal doc


but in blog its quite different

thank you

For the structure of dataset, please refer to LPRNet - NVIDIA Docs.

Thanks for replying
For seven segment detection lprnet and
ocrnet which one is best
Also in Google colab, directly I can execute the commands that are in above link any setup is required for Tao toolkit

For your case, you can use OCRNet since the Digital Metre is not a license plate.
Currently in google colab, there is not ocrnet notebook.
You can try to run with local gpu machine or remote cloud machine. See https://docs.nvidia.com/tao/tao-toolkit/text/running_in_cloud/overview.html.
For running OCRNet, you can refer to its notebook as getting started. Refer to https://github.com/NVIDIA/tao_tutorials/tree/main/notebooks/tao_launcher_starter_kit/ocrnet.
The notebook files can be downloaded via the guide mentioned in TAO Toolkit Quick Start Guide - NVIDIA Docs.

i need to draw bounding boxes for images in test ans train folders also according to noteboook where should i place the character_list.txt?

I am getting this error any thoughts…!

# Convert the raw train dataset to lmdb
print("Converting the training set to LMDB.")
!tao model ocrnet dataset_convert -e $SPECS_DIR/experiment.yaml \
                            dataset_convert.input_img_dir=$DATA_DIR/train \
                            dataset_convert.gt_file=$DATA_DIR/train/gt_new.txt \
Converting the training set to LMDB.
Traceback (most recent call last):
  File "/usr/local/bin/tao", line 8, in <module>
  File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_cli/entrypoint/tao_launcher.py", line 134, in main
  File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_cli/components/instance_handler/local_instance.py", line 356, in launch_command
  File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_cli/components/instance_handler/utils.py", line 151, in docker_logged_in
    data = load_config_file(docker_config)
  File "/usr/local/lib/python3.10/dist-packages/nvidia_tao_cli/components/instance_handler/utils.py", line 84, in load_config_file
    assert os.path.exists(config_path), (
AssertionError: Config path must be a valid unix path. No file found at: /root/.docker/config.json. Did you run docker login?

You can refer to https://github.com/NVIDIA/tao_tutorials/blob/main/notebooks/tao_launcher_starter_kit/ocrnet/ocrnet.ipynb. Search “character_list” and you will find how to generate it and how to set it in the training command line.

For drawing bounding boxes for images, you can use OCDNet. OCDNet will detect all the characters in the image. In this approach, it means you will train OCDNet and OCRNet, then run inference as mentioned in the blog.

Instead, for 2nd approach, if you have or will generate dataset for the coordinate of the display panel in Digital Metres, actually you can also use detection network(like YOLOv4) to detect where is the display panel which looks like the yellow bbox you shared. After that, you can run lprnet or OCRnet to recognize the digital values.

It is a common error. You can search the error info in TAO forum and then leverage the solution.
Search results for '"Config path must be a valid unix path" #intelligent-video-analytics:tao-toolkit order:latest' - NVIDIA Developer Forums.

Update, it is a common error. You can try the hints from above search results.

really the scenario actually i am going through is difficult for me …
can you please guide me along the second approach to detect seven segment in digital metres
I mean what steps i have to follow ?
I have 200 images of digital metre display.

To detect seven segment in digital metres, this is an object detection task. You can use YOLOv4 network.
You can label your 200 images. For example, labelme or other tool.
YOLOv4 expects label format mentioned in Data Annotation Format - NVIDIA Docs. For example,

panel  0.00 0 0.00 665.45 160.00 717.93 217.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Then you can try to follow https://github.com/NVIDIA/tao_tutorials/blob/main/notebooks/tao_launcher_starter_kit/yolo_v4/yolo_v4.ipynb to get familiar with the process. Such as, generate anchor shapes with kmeans, generate tfrecords, set training spec file and run training, etc.

User can use the tao launcher mentioned in the notebook.
Or you can run docker run.
$ docker run --runtime=nvidia -it --rm nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5 /bin/bash
Then inside the docker, run commands without tao in the beginning.
# yolo_v4 train xxx

I have trained yolov5 on my csutom data in google colab without usage of nvidia through ultralytics yolov5 notebook .i am getting bounding boxes around seven segment values
Now i want to detect the numbers in bounding boxes using tao toolkit in colab
note:i have best.pt and last.pt files in my hands
Thank you

So, you can cropped the images which contain seven segment values only. Then, run inference with LPRNet or OCRNet. See “Inference” section in https://github.com/NVIDIA/tao_tutorials/blob/95aca39c79cb9068593a6a9c3dcc7a509f4ad786/notebooks/tao_launcher_starter_kit/lprnet/lprnet.ipynb or https://github.com/NVIDIA/tao_tutorials/blob/95aca39c79cb9068593a6a9c3dcc7a509f4ad786/notebooks/tao_launcher_starter_kit/ocrnet/ocrnet.ipynb.
The source code is https://github.com/NVIDIA/tao_tensorflow1_backend/blob/main/nvidia_tao_tf1/cv/lprnet/scripts/inference.py or https://github.com/NVIDIA/tao_pytorch_backend/blob/e5010af08121404dfb696152248467eee85ab3a7/nvidia_tao_pytorch/cv/ocrnet/scripts/inference.py

The LPR model or OCR model can be found in License Plate Recognition | NVIDIA NGC (You can decode the etlt to onnx via the guide in https://github.com/NVIDIA-AI-IOT/tao_toolkit_recipes/blob/main/tao_forum_faq/FAQ.md#tlt-or-etlt)or Optical Character Recognition | NVIDIA NGC
You can also run LPRNet inference via GitHub - NVIDIA-AI-IOT/tao-toolkit-triton-apps: Sample app code for deploying TAO Toolkit trained models to Triton.

Thanks your suggestions helped !