OCDnet model keeps failing to train everytime

Please provide the following information when requesting support.

• Hardware (NVIDIA A10G)
• Network Type (Ocdnet VIT and resnet 50 )
• Docker being used : nvcr.io/nvidia/tao/tao-toolkit:5.2.0-pyt2.1.0
• Training spec file :
load_pruned_graph: False
pruned_graph_path: ‘/results/prune/pruned_0.1.pth’
pretrained_model_path: ‘/data/ocdnet/ocdnet_fan_tiny_2x_icdar.pth’
backbone: fan_tiny_8_p4_hybrid
#backbone: deformable_resnet18
enlarge_feature_map_size: True
activation_checkpoint: True


results_dir: /home/ubuntu/OCD/OCD-data-results
num_epochs: 80
#resume_training_checkpoint_path: ‘/home/ubuntu/OCD/ocdnet_craft_result/ocd_model_epoch_009.pth’
checkpoint_interval: 1
validation_interval: 1
is_dry_run: False
precision: fp16
model_ema: False
model_ema_decay: 0.999
clip_grad_norm: 5.0

type: Adam
lr: 0.001

type: WarmupPolyLR
warmup_epoch: 3

type: SegDetectorRepresenter
thresh: 0.3
box_thresh: 0.55
max_candidates: 1000
unclip_ratio: 1.5

type: QuadMetric
is_output_polygon: False

data_path: [‘/home/ubuntu/OCD/OCD-data/train-data’]
- type: IaaAugment
- {‘type’:Fliplr, ‘args’:{‘p’:0.5}}
- {‘type’: Affine, ‘args’:{‘rotate’:[-45,45]}}
- {‘type’:Sometimes,‘args’:{‘p’:0.2, ‘then_list’:{‘type’: GaussianBlur, ‘args’:{‘sigma’:[1.5,2.5]}}}}
- {‘type’:Resize,‘args’:{‘size’:[0.5,3]}}
- type: EastRandomCropData
size: [640,640]
max_tries: 50
keep_ratio: true
- type: MakeBorderMap
shrink_ratio: 0.4
thresh_min: 0.3
thresh_max: 0.7
- type: MakeShrinkMap
shrink_ratio: 0.4
min_text_size: 8

    img_mode: BGR
    filter_keys: [img_path,img_name,text_polys,texts,ignore_tags,shape]
    ignore_tags: ['*', '###']
    batch_size: 2
    pin_memory: true
    num_workers: 4

data_path: [‘/home/ubuntu/OCD/OCD-data/test-data’]
- type: Resize2D
- 1280
- 736
resize_text_polys: true
img_mode: BGR
ignore_tags: [‘*’, ‘###’]
batch_size: 2
pin_memory: false
num_workers: 1
ISSUE : I have trained the model more than 10 times with same dataset and made some variations but model still fails to train and loss never drops less than 0.968 and less better with 0.9 and 1.0 . So, Please help me in this .
Text are in dot matrix format and here is a part of image.
Dataset Size is 5382 images including test and train

Training Terminal Snippet :

Did you train completely?

How about the images’ resolution?

Did you train completely?
Yes i have trained it completely many times but the results are worst
How about the images’ resolution?
width : 945
height : 1587

Can you run evaluation with similar resolution against original images?
Change above to

- 960
- 1600

you mean

  • short size
    1280 with 1600
    and 736 with 960

No, I mean width(960) x height(1600) since your images has resolution of width(945) x height(1587).

  • 960
  • 1600

Besides evaluation, you can also run inference to check the result.

My gpu gets exhausted when validation part comes

As my gpu is of 24GB.

You can try width(800) x height(1344).

still goes out of memory .

the images are in this format and ignore the image size as it is resized in the code.


So, the test images are 1526x2048, right?

Is it running inference? Can you share the command and spec file?

@Morganh Can i use resnet50 model with ocrnet vit model in triton for inferencing. is this possibleas i am facing the issue in running the ocrnet vit with resnet50 ocdnet model .
command used to convert model to engine :
/usr/src/tensorrt/bin/trtexec --onnx=./ocdnet.onnx --minShapes=input:1x3x736x1280 --optShapes=input:1x3x736x1280 --maxShapes=input:4x3x736x1280 --fp16 --saveEngine=./ocdnet.fp16.engine
Note : Ran with resnet50 model worked for inferencing i changed width and height to 1280 and 736

For ocr-resnet or ocr-vit, both should be working.
But the setting is not the same in triton spec json.

The triton server does not setup. The error

shows that something mismatching in engine generation or spec json setting.

both the model has same input size , right?
this is the spec file used

“is_high_resolution_input”: false,
“resize_keep_aspect_ratio”: true,

"overlapRate": 0.5,
"input_data_format": "NHWC",
"ocdnet_trt_engine_path": "/opt/nvocdr/engines/ocdnet_vit.fp16.engine",
"ocdnet_infer_input_shape": [
"ocdnet_binarize_threshold": 0.3,
"ocdnet_polygon_threshold": 0.1,
"ocdnet_unclip_ratio": 1.5,
"ocdnet_max_candidate": 1000,
"upsidedown": true,
"ocrnet_trt_engine_path": "/opt/nvocdr/engines/ocrnet_vit.fp16.engine",
"ocrnet_dict_file": "/opt/nvocdr/onnx_model/character_list",
"ocrnet_decode": "Attention", 
"ocrnet_infer_input_shape": [
"font_size": 0.6,
"font_color": [0,0,255]


This is needed to set.

yeah, it is present 736 and 1280 but still shows that part.

After re-rechecking the log,

The triton server is up. But there is error in ocr engine. Can you double the OCR command to generate tensorrt engine?
Please double check the steps mentioned in the github. We confirm that it is working.
You can try with default models.

1 Like

okay, i checked it but still failing the tensorrt engine command for my trained resnet50 model was this :

/usr/src/tensorrt/bin/trtexec --onnx=./ocdnet.onnx --minShapes=input:1x3x736x1280 --optShapes=input:1x3x736x1280 --maxShapes=input:4x3x736x1280 --fp16 --saveEngine=/opt/nvocdr/engines/ocdnet.fp16.engine

And i had made changes in spec.json file for engines path but still it is unable to infer
image of width 945 and height 736
Note :
I aam using resnet50 OCDnet model for detection and VIT OCRnet model for recogniton , both are trained on custom data but when it is ran over triton it cant run triton_stub becomes unhealthy and it reloads the models

How about the command for ocrnet? Did you follow the github to generate ocrnet engine?
See GitHub - NVIDIA-AI-IOT/NVIDIA-Optical-Character-Detection-and-Recognition-Solution: This repository provides optical character detection and recognition solution optimized on Nvidia devices..

For OCDNet, when generate ocd tensorrt engine, if you use default
/usr/src/tensorrt/bin/trtexec --onnx=./ocdnet.onnx --minShapes=input:1x3x736x1280 --optShapes=input:1x3x736x1280 --maxShapes=input:4x3x736x1280 --fp16 --saveEngine=/opt/nvocdr/engines/ocdnet.fp16.engine, then you need to set width 1280 and height 736 in the spec file. The width and height in the spce file should match the width and height in the trtexec command line.

For OCRNet, please check if you are using resnet50 or vit. Then set corresponding height and width in ocrnet part of spec file.