OCRNet FAN_tiny_2x Backbone

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
-4 T4
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
-OCRNet
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
-format_version: 3.0
-toolkit_version: 5.3.0
-published_date: 03/14/2024
• Training spec file(If have, please share here)

results_dir: /results
encryption_key: nvidia_tao
model:
TPS: False
feature_channel: 512
hidden_size: 256
prediction: Attn
quantize: False
input_width: 100
input_height: 64
input_channel: 3
backbone: FAN_tiny_2X

dataset:
train_dataset_dir: [/data/train/lmdb]
val_dataset_dir: /data/test/lmdb
character_list_file: /data/character_list.txt
max_label_length: 25
batch_size: 32
workers: 4
augmentation:
keep_aspect_ratio: False
aug_prob: 0.5
reverse_color_prob: 0.5
rotate_prob: 0.5
max_rotation_degree: 10
blur_prob: 0.1
gaussian_radius_list: [1, 2]
train:
seed: 1111
gpu_ids: [0,1,2,3]
optim:
name: “adadelta”
lr: 0.1
clip_grad_norm: 5.0
num_epochs: 250
checkpoint_interval: 5
validation_interval: 5

I am attempting to train an OCRNet model using a FAN_tiny_2X backbone and I am getting the follow error:

Error executing job with overrides:
Traceback (most recent call last):
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/ocrnet/scripts/train.py”, line 144, in main
raise e
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/ocrnet/scripts/train.py”, line 128, in main
run_experiment(experiment_spec=cfg)
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/ocrnet/scripts/train.py”, line 60, in run_experiment
ocrnet_model = OCRNetModel(experiment_spec)
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/ocrnet/model/pl_ocrnet.py”, line 122, in init
self._build_model(experiment_spec)
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/ocrnet/model/pl_ocrnet.py”, line 146, in _build_model
self.model = build_ocrnet_model(experiment_spec=experiment_spec,
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/ocrnet/model/build_nn_model.py”, line 156, in build_ocrnet_model
model = Model(opt)
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/ocrnet/model/model.py”, line 57, in init
self.FeatureExtraction = fan_tiny_8_p2_hybrid(in_chans=opt.input_channel, in_height=opt.imgH, in_width=opt.imgW)
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/ocrnet/model/fan.py”, line 557, in init
super(fan_tiny_8_p2_hybrid, self).init(**model_kwargs)
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/ocrnet/model/fan.py”, line 362, in init
self.patch_embed = HybridEmbed(img_size=img_size, backbone=backbone, patch_size=hybrid_patch_size, embed_dim=embed_dim, in_chans=in_chans)
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/ocrnet/model/fan.py”, line 164, in init
assert feature_size[0] % patch_size[0] == 0 and feature_size[1] % patch_size[1] == 0
AssertionError

The model trains fine with ResNet and ResNet2X.

I removed feature_channel: 512 from my config and now I am receiving a new error:

raceback (most recent call last):
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/ocrnet/scripts/train.py”, line 144, in main
raise e
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/ocrnet/scripts/train.py”, line 128, in main
run_experiment(experiment_spec=cfg)
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/ocrnet/scripts/train.py”, line 60, in run_experiment
ocrnet_model = OCRNetModel(experiment_spec)
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/ocrnet/model/pl_ocrnet.py”, line 122, in init
self._build_model(experiment_spec)
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/ocrnet/model/pl_ocrnet.py”, line 146, in _build_model
self.model = build_ocrnet_model(experiment_spec=experiment_spec,
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/ocrnet/model/build_nn_model.py”, line 156, in build_ocrnet_model
model = Model(opt)
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/ocrnet/model/model.py”, line 60, in init
raise Exception(‘No FeatureExtraction module specified’)
Exception: No FeatureExtraction module specified

It appears this error has been occurring since TAO 5.0. Are we still not able to train using FAN_tiny_2x on TAO 5.3?

Could you please refer to the ocr-vit yaml file? See tao_tutorials/notebooks/tao_launcher_starter_kit/ocrnet/specs/experiment-vit.yaml at main · NVIDIA/tao_tutorials · GitHub.
The vit version yaml file is different from the non-vit version. Refer to tao_tutorials/notebooks/tao_launcher_starter_kit/ocrnet/specs at main · NVIDIA/tao_tutorials · GitHub.

This is working now. Thank you.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.