Exploring AI-Powered OCR for Automated Utility Bill Recognition

Hey everyone,

I’ve been exploring different AI-based Optical Character Recognition (OCR) techniques to extract and process structured data from scanned utility bills. Many utility companies still provide bills in PDF or image formats, which makes automation challenging.

I was wondering if anyone here has experience implementing NVIDIA’s AI-powered OCR models for similar tasks? Specifically:

  • What are the best models for recognizing complex text structures on utility bills?
  • How effective is TensorRT for optimizing OCR inference speed?
  • Any experience with NVIDIA’s TAO Toolkit for training custom OCR models?

I recently worked on a project related to utility bill checking for FESCO in Pakistan fescobills.net.pk, and I see a huge potential in automating data extraction from these bills. Would love to hear your thoughts on the best practices!

Hey, great topic!
I’ve been experimenting with OCR for utility bills as well, especially for handling formatted bill PDFs and scanned copies where font styles vary and noise is common. From my experience, a hybrid pipeline gives better accuracy instead of relying on a single model.
Here are a few things that worked well in testing:

  1. Pre-processing before OCR improves accuracy a lot
    Binarization & noise cleaning
    Skew correction
    Region-based text extraction for reference no., billing month, meter reading etc.
  2. Model Choice
    For structured layout extraction, LayoutLM / Donut performs better than plain Tesseract, especially for tables.
    For lightweight use, Tesseract + custom regex post-processing still works fine.
  3. GPU Optimization
    Using TensorRT gave noticeably faster inference when batch-processing multiple bills.
    On small deployments, quantization + lower precision models help reduce latency.
  4. Real Case Experiment
    I tested a workflow for electricity bills where the system extracts:
    Reference number → Billing month → Units → Payable amount.
    Later integrated it into a simple bill inquiry system, where users check bills online through reference ID.
  5. Still Exploring
    I’m currently looking into training a fine-tuned model for noisy scans & handwritten values.
    If anyone here has tried NVIDIA TAO Toolkit for similar financial or invoice-style datasets, I’d love to hear results or training configuration tips.

@caseyhunt01
We have some experience in finetuning handwriting and invoice-style datasets via TAO OCDNet and OCRNet.
In TAO, there are some pretrained models of OCDNet. See https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/ocdnet. There are some trainable models from different kinds of backbones. Also, there are some deployable models in it. BTW, one more deployable model is https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/mixnet?version=deployable_v1.0.

For OCRNet, please see https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/ocrnet?.

You can finetune OCDNet model and also OCRNet model.

More info about them, please refer to TAO doc:
OCDnet: OCDNet — Tao Toolkit
OCRNet: OCRNet — Tao Toolkit

Any issues, please create topic in TAO forum: TAO Toolkit - NVIDIA Developer Forums. Thanks!

1 Like

Thanks for the detailed pointers, this is really helpful.

I’ll explore OCDNet and OCRNet via TAO, especially for invoice-style and noisy scans.
Do you have any general guidance on dataset size or annotation strategy that worked best for utility or financial documents?

Appreciate the clarification.

You can refer to tao_tutorials/notebooks/tao_launcher_starter_kit/ocdnet/ocdnet.ipynb at main · NVIDIA/tao_tutorials · GitHub. This demo runs training against ICDAR15(1000 training images).
For annotation, you may refer to some tools(e.g., labelme) to do label.

For OCRNet, you can take a look at tao_tutorials/notebooks/tao_launcher_starter_kit/ocrnet/ocrnet.ipynb at main · NVIDIA/tao_tutorials · GitHub. I am afraid you do not need to retrain OCRNet model. Just need to retrain OCDNet model.

Full deploy pipeline can be found in GitHub - NVIDIA-AI-IOT/NVIDIA-Optical-Character-Detection-and-Recognition-Solution: This repository provides optical character detection and recognition solution optimized on Nvidia devices. also.

1 Like