Traning FasterRCNN using Transfer Learning Toolkit

I am training FRCNN in TLT for Resnet18 and Mobilenet_v2.
Available models are seen and listed using
ngc registry model list nvidia/iva/tlt_*

Downloaded Resnet18 and Mobilenet_v2 using the following commands.

ngc registry model download-version nvidia/iva/tlt_resnet18_faster_rcnn:1

ngc registry model download-version nvidia/iva/tlt_mobilenet_v2_faster_rcnn:1

Both failed in training with different issues.
For Mobilenet_v2, training failed with

  Traceback (most recent call last):
  File "/usr/local/bin/tlt-train-g1", line 8, in <module>
    sys.exit(main())
  File "./common/magnet_train.py", line 30, in main
  File "./faster_rcnn/scripts/train.py", line 273, in main
  File "./faster_rcnn/data_loader/loader.py", line 200, in kitti_data_gen
UnboundLocalError: local variable 'image_channel_order' referenced before assignment

Resnet18 failed with

 2020-04-07 03:44:08,525 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loading pretrained weights from /workspace/tlt_resnet18_faster_rcnn_v1/resnet18.h5
Traceback (most recent call last):
  File "/usr/local/bin/tlt-train-g1", line 8, in <module>
    sys.exit(main())
  File "./common/magnet_train.py", line 30, in main
  File "./faster_rcnn/scripts/train.py", line 232, in main
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/network.py", line 1163, in load_weights
    reshape=reshape)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/saving.py", line 1130, in load_weights_from_hdf5_group_by_name
    ' element(s).')
ValueError: Layer #4 (named "block_1a_conv_1") expects 1 weight(s), but the saved weights have 2 element(s).  

My TLT version is latest nvcr.io/nvidia/tlt-streamanalytics:v1.0.1_py2

How can I fixed the issues?
spec files for Resnet and Mobilenet are as follows.

Please attach your training spec.

Thanks spec files are in the following links.

Hi batu,
Could you please paste here directly? You can use “upload” button when you reply.

Sorry Sir.
The spec files for Resnet and Mobilenet are attached.
specs_frcnn.log (3.5 KB) specs_mobilenet_v2.log (3.4 KB)
The extensions were changed to log if not, they can’t be submitted.
My trained image size is 736 x 736 (multiple of 32).

Two comments.

  1. The “feature_extractor” field should match your backbone. From your specs_mobilenet_v2.log, it is wrong.
  2. For MobileNet V1/V2, if we want to load the pretrained weights in NGC for training/retrain, we should set the “conv_bn_share_bias” field in the experiment_spec file to be “True” . For all other backbones, if we want to load the pretrained weights in NGC for training/retrain, we should set them to be “False”.

Thanks I have updated as you mentioned.
Now both resnet:18 and mobilenet_v2 have similar error as follows.

2020-04-08 05:10:17,659 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loading pretrained weights from /workspace/tlt-experiments/FasterRCNN_18/resnet18.h5
2020-04-08 05:10:19,319 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Pretrained weights loaded!
2020-04-08 05:10:19,515 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: training example num: 4579
2020-04-08 05:10:19,657 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Starting training
2020-04-08 05:10:19,657 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Epoch 1/7
Found 4579 examples in training dataset, valid image extension isjpg, jpeg and png(case sensitive)

Compressed_class_mapping: {u'plate': 0, u'background': 2, u'textline': 1}

Name mapping:{u'plate': u'plate', u'background': u'background', u'textline': u'textline'}

Training dataset stats(compressed via class mapping):

{u'plate': 5164, u'background': 0, u'textline': 6586}


Traceback (most recent call last):
  File "/usr/local/bin/tlt-train-g1", line 8, in <module>
    sys.exit(main())
  File "./common/magnet_train.py", line 30, in main
  File "./faster_rcnn/scripts/train.py", line 273, in main
  File "./faster_rcnn/data_loader/loader.py", line 200, in kitti_data_gen
UnboundLocalError: local variable 'image_channel_order' referenced before assignment

My two sepcs files are attached.
specs_frcnn.log (3.5 KB) specs_mobilenet_v2.log (3.6 KB)

What’s your image_channel_order, rgb or bgr?
Did you try KITTI dataset along with the default spec inside the docker?

The nature of this error UnboundLocalError: local variable 'image_channel_order' referenced before assignment doesn’t matter rgb or bgr channel order.
It is because of variable assignment without initialization in the source code.

I used default spec file frcnn_kitti_retrain_spec.txt. It worked. Thanks