Faster RCNN ResNet-101 Problems

Hi, I am retrying nvidia-tlt after more than three months, due to the release of DeepStream 5.0 and and other improvements, notably the availability of ResNet-101 as backbone. In order to recall how everything works, I am going through the ipynb example. The config file is exactly as it is the Docker container, except for some paths. I have two problems at the moment:

  1. Training worked ok, as losses decreased during the first few epochs, though they started going up after epoch 4 or so. But evaluation is terrible: I am getting zero mAP, zero precision and zero recall. I ran the visualisation, and turns out the model is predicting the same box at the right bottom corner for every image.

Training:

Epoch 1/12
6434/6434 [==============================] - 5866s 912ms/step - loss: 0.5379 - rpn_out_class_loss: 0.1280 - rpn_out_regress_loss: 0.0147 - dense_class_td_loss: 0.1353 - dense_regress_td_loss: 0.0825 - dense_class_td_acc: 0.9660
Epoch 2/12
6434/6434 [==============================] - 5409s 841ms/step - loss: 0.3461 - rpn_out_class_loss: 0.1283 - rpn_out_regress_loss: 0.0129 - dense_class_td_loss: 0.1016 - dense_regress_td_loss: 0.0622 - dense_class_td_acc: 0.9731
Epoch 3/12
6434/6434 [==============================] - 5387s 837ms/step - loss: 0.3563 - rpn_out_class_loss: 0.1272 - rpn_out_regress_loss: 0.0125 - dense_class_td_loss: 0.1111 - dense_regress_td_loss: 0.0685 - dense_class_td_acc: 0.9702
Epoch 4/12
6434/6434 [==============================] - 5383s 837ms/step - loss: 0.3463 - rpn_out_class_loss: 0.1269 - rpn_out_regress_loss: 0.0124 - dense_class_td_loss: 0.1062 - dense_regress_td_loss: 0.0649 - dense_class_td_acc: 0.9714
Epoch 5/12
6434/6434 [==============================] - 5385s 837ms/step - loss: 0.3914 - rpn_out_class_loss: 0.1267 - rpn_out_regress_loss: 0.0122 - dense_class_td_loss: 0.1343 - dense_regress_td_loss: 0.0831 - dense_class_td_acc: 0.9643
Epoch 6/12
6434/6434 [==============================] - 5379s 836ms/step - loss: 0.3680 - rpn_out_class_loss: 0.1267 - rpn_out_regress_loss: 0.0122 - dense_class_td_loss: 0.1209 - dense_regress_td_loss: 0.0735 - dense_class_td_acc: 0.9681
Epoch 7/12
6434/6434 [==============================] - 5371s 835ms/step - loss: 0.3707 - rpn_out_class_loss: 0.1266 - rpn_out_regress_loss: 0.0121 - dense_class_td_loss: 0.1224 - dense_regress_td_loss: 0.0753 - dense_class_td_acc: 0.9672
Epoch 8/12
6434/6434 [==============================] - 5372s 835ms/step - loss: 0.3709 - rpn_out_class_loss: 0.1266 - rpn_out_regress_loss: 0.0121 - dense_class_td_loss: 0.1226 - dense_regress_td_loss: 0.0757 - dense_class_td_acc: 0.9672

Evaluation:

2020-05-22 21:39:41,472 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/test.pyc: 1046/1047
2020-05-22 21:39:41,767 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/test.pyc: Elapsed time = 0.294595956802
================================================================================
Class AP precision recall RPN_recall


cyclist 0.0000 0.0000 0.0000 0.0425


car 0.0000 0.0000 0.0000 0.1037


person 0.0000 0.0000 0.0000 0.0437


mAP = 0.0000

  1. I went ahead and tried to execute the model export sections. But even this is not working.

Using TensorFlow backend.
Traceback (most recent call last):
File “/usr/local/bin/tlt-export”, line 8, in
sys.exit(main())
File “./common/export/app.py”, line 221, in main
File “./common/export/base_exporter.py”, line 69, in set_keras_backend_dtype
File “./common/utils.py”, line 189, in get_decoded_filename
IOError: Invalid decryption. Unable to open file (File signature not found). The key used to load the model is incorrect.

Can you please help? Thanks

Are you using 2.0_dp version docker now? If yes, please recheck your images/labels. Because for new 2.0_dp docker, faster-rcnn does not support training on images of multiple resolutions, or resizing images during training. So, all of the images must be resized offline to the final training size and the corresponding bounding boxes must be scaled accordingly.
For your 2nd question, please make sure the API key is correct.

The images are from the KITTI dataset. As per the Jupyter notebook, I downloaded the zip folders, unzipped them, and converted them to TF Records. When I ran the visualisation later, I noticed that the images were all slightly different in size (e.g. 1224 x 370, 1242 x 375, …) I’d assumed they would all be the size specified in the default specs. But don’t you think the problem here is during evaluation and inference, and not necessarily during training?

For the key, I’ve double checked. It’s the same key that I used for downloading the data and for training, both steps which worked ok. Only thing I could try is to generate a new key as this key is from January.

Yes, if you were using KITTI dataset, you need not resize because your setting in training spec matches the average resolution of KITTI dataset. Unfortunately, I can reproduce the issue you mentioned, I will sync with internal team about that. Will update to you if there is any finding.
For the key, you need not to generate a new key. Just please to confirm

  1. key is correct
  2. $key is not empty , $key is correct
  3. you were training this tlt model with the same key.

Hi @morganh, the key issue seems to be fine now. I think I had a mistake with single quote marks around the key ‘$KEY’. Please let me know when you have some news on the evaluation / inference. Thanks!

For mAP issue, looks like the pretrained weights is not so good. So, please do not freeze any CNN blocks in the spec file, i.e., do not specify any freeze_blocks in it. Also do not freeze_bn.
-freeze_bn: True
-freeze_blocks: 0
-freeze_blocks: 1
+freeze_bn: False

Try to run training with batchsize 1 on a single GPU. More, you may meet OOM error since ResNet101 is a big backbone which requires more GPU memory. In this case, please try another GPU.
If user have more GPU memory, you can also increase batch size to get better mAP. But basically, ResNet101 is big and cannot use batch size 16 on a single GPU.

Hi, I did the changes you’ve mentioned for BatchNorm, and trained with batch size = 1. Training was better this time as losses kept going down for all 12 epochs, but eval is poor, especially for non-car classes:

================================================================================
Class               AP                  precision           recall              RPN_recall          
--------------------------------------------------------------------------------
cyclist             0.0000              0.0000              0.0000              0.3538              
--------------------------------------------------------------------------------
car                 0.3041              0.9744              0.3058              0.6382              
--------------------------------------------------------------------------------
person              0.0000              0.0000              0.0000              0.3879              
--------------------------------------------------------------------------------
mAP = 0.1014 

I could train for more epochs, but in January I had better results with ResNet-50 with fewer epochs. I think there’s still something wrong with ResNet-101 or this release of tlt.

January results with ResNet-50:

================================================================================
Class               AP                  precision           recall              
--------------------------------------------------------------------------------
Cyclist             0.5365              0.4578              0.6023              
--------------------------------------------------------------------------------
Pedestrian          0.5150              0.6083              0.5689              
--------------------------------------------------------------------------------
Car                 0.7911              0.7807              0.8109              
--------------------------------------------------------------------------------
mAP = 0.6142

For reference, here are the training logs (from today):

==================================================================================================
Total params: 79,869,949
Trainable params: 79,707,261
Non-trainable params: 162,688
__________________________________________________________________________________________________
Epoch 1/12
6434/6434 [==============================] - 6167s 959ms/step - loss: 0.7579 - rpn_out_class_loss: 0.0416 - rpn_out_regress_loss: 0.0153 - dense_class_td_loss: 0.1394 - dense_regress_td_loss: 0.1394 - dense_class_td_acc: 0.9499
Epoch 2/12
6434/6434 [==============================] - 5658s 879ms/step - loss: 0.4526 - rpn_out_class_loss: 0.0278 - rpn_out_regress_loss: 0.0084 - dense_class_td_loss: 0.1112 - dense_regress_td_loss: 0.1220 - dense_class_td_acc: 0.9595
Epoch 3/12
6434/6434 [==============================] - 5619s 873ms/step - loss: 0.3876 - rpn_out_class_loss: 0.0244 - rpn_out_regress_loss: 0.0072 - dense_class_td_loss: 0.0999 - dense_regress_td_loss: 0.1124 - dense_class_td_acc: 0.9631
Epoch 4/12
6434/6434 [==============================] - 5580s 867ms/step - loss: 0.3577 - rpn_out_class_loss: 0.0224 - rpn_out_regress_loss: 0.0066 - dense_class_td_loss: 0.0952 - dense_regress_td_loss: 0.1079 - dense_class_td_acc: 0.9649
Epoch 5/12
6434/6434 [==============================] - 5548s 862ms/step - loss: 0.3255 - rpn_out_class_loss: 0.0198 - rpn_out_regress_loss: 0.0062 - dense_class_td_loss: 0.0881 - dense_regress_td_loss: 0.1018 - dense_class_td_acc: 0.9672
Epoch 6/12
6434/6434 [==============================] - 5529s 859ms/step - loss: 0.3097 - rpn_out_class_loss: 0.0190 - rpn_out_regress_loss: 0.0059 - dense_class_td_loss: 0.0842 - dense_regress_td_loss: 0.0983 - dense_class_td_acc: 0.9685
Epoch 7/12
6434/6434 [==============================] - 5502s 855ms/step - loss: 0.2943 - rpn_out_class_loss: 0.0180 - rpn_out_regress_loss: 0.0057 - dense_class_td_loss: 0.0805 - dense_regress_td_loss: 0.0944 - dense_class_td_acc: 0.9699
Epoch 8/12
6434/6434 [==============================] - 5496s 854ms/step - loss: 0.2834 - rpn_out_class_loss: 0.0172 - rpn_out_regress_loss: 0.0056 - dense_class_td_loss: 0.0788 - dense_regress_td_loss: 0.0925 - dense_class_td_acc: 0.9704
Epoch 9/12
6434/6434 [==============================] - 5489s 853ms/step - loss: 0.2748 - rpn_out_class_loss: 0.0165 - rpn_out_regress_loss: 0.0055 - dense_class_td_loss: 0.0769 - dense_regress_td_loss: 0.0904 - dense_class_td_acc: 0.9713
Epoch 10/12
6434/6434 [==============================] - 5478s 851ms/step - loss: 0.2733 - rpn_out_class_loss: 0.0166 - rpn_out_regress_loss: 0.0054 - dense_class_td_loss: 0.0756 - dense_regress_td_loss: 0.0897 - dense_class_td_acc: 0.9717
Epoch 11/12
6434/6434 [==============================] - 5482s 852ms/step - loss: 0.2600 - rpn_out_class_loss: 0.0153 - rpn_out_regress_loss: 0.0052 - dense_class_td_loss: 0.0732 - dense_regress_td_loss: 0.0869 - dense_class_td_acc: 0.9723
Epoch 12/12
6434/6434 [==============================] - 5490s 853ms/step - loss: 0.2587 - rpn_out_class_loss: 0.0154 - rpn_out_regress_loss: 0.0052 - dense_class_td_loss: 0.0725 - dense_regress_td_loss: 0.0861 - dense_class_td_acc: 0.9727

Thanks for the details. We are still checking the mAP too. Several comments here.

  1. The resnet101 is a big network. Training with a big backbone(like resnet101) against a small dataset(like KITTI ) seems to be not good.
  2. We find that the intermediate model may have a better validation mAP. Next release(2.0 GA) in faster-rcnn will implement validation during training. It is convenient to check the mAP periodically.
  3. For resnet50 you mentioned, could you please check mAP result in 2.0_dp docker comparing to 1.0.1 docker?

Hi, I trained with ResNet50 last night. Evaluated just now, with object_confidence_thres: 0.50, and got these results:

================================================================================
Class               AP                  precision           recall              RPN_recall          
--------------------------------------------------------------------------------
cyclist             0.6452              0.4140              0.7264              0.9151              
--------------------------------------------------------------------------------
car                 0.8536              0.8128              0.8679              0.9846              
--------------------------------------------------------------------------------
person              0.6000              0.5253              0.6587              0.9013              
--------------------------------------------------------------------------------
mAP = 0.6996 

Note that I had

freeze_bn: True
freeze_blocks: 0
freeze_blocks: 1

during training.
So looks like the problem is only with ResNet-101.
I doubt it’s the size of the dataset that’s causing the problem. I’ve trained Faster-RCNN with R101 in other frameworks (Tensorflow & PyTorch) with quite small datasets and had good results.

We will dig out more for resnet101.
One question, which pretrained weights did you use to train the ResNet101 FasterRCNN on tensorflow?

Hi chandrachud,

Can you share the code base for you tensorflow ResNet101 FasterRCNN traning? Basically I would like to know batch size you used in this training, ResNet101 is a huge backbone and can not fit into a single GPU with a large batch size like 16. So the moving mean and moving variance is not good in this case.

BTW, what batch size did you use when you train ResNet50 in TLT?

Thanks
Zhimeng

Sorry I was mistaken. I have only used ResNet-18 and ResNet-50 in TensorFlow. In PyTorch I’ve used ResNet-101 (available from Torchvision), but I guess to convert the pretrained weights to something compatible, it’ll be complicated.

For both ResNet101 and ResNet50, I used the default batch size in the config file of 1. Didn’t change anything except for paths to images.

Like I said in my reply to Morgan, I haven’t actually used ResNet101 in Tensorflow. Apologies for the error.

Pretrained weights seem to be available for tensorflow.keras. Will this work for nvidia-tlt? Not sure what the relation is between the batch size used for pre-training on ImageNet and our training as part of faster-rcnn? Small batch size for faster-rcnn may be acceptable, even if it’s slower than ideal.

Hi cbasavaraj,

An arbitrary pretrained weights found on Internet will not be able to loaded into a TLT FasterRCNN training since the weights are loaded by name and depends on the implementation. The training batch size of TLT FasterRCNN is not related to the ImageNet training.

Will update you more later.

Thanks
Zhimeng

@cbasavaraj
NV internal team changed the optimizer to SGD and finetuned learning rate scheduler, the mAP can reach 49% now. Please try on your side too.Thanks.

Attach the training spec for your reference.

##Copyright (c) 2017-2020, NVIDIA CORPORATION. All rights reserved.
random_seed: 42
enc_key: ‘tlt’
verbose: True
network_config {
input_image_config {
image_type: RGB
image_channel_order: ‘bgr’
size_height_width {
height: 384
width: 1248
}
image_channel_mean {
key: ‘b’
value: 103.939
}
image_channel_mean {
key: ‘g’
value: 116.779
}
image_channel_mean {
key: ‘r’
value: 123.68
}
image_scaling_factor: 1.0
max_objects_num_per_image: 100
}
feature_extractor: “resnet:101”
anchor_box_config {
scale: 64.0
scale: 128.0
scale: 256.0
ratio: 1.0
ratio: 0.5
ratio: 2.0
}
freeze_bn: False
roi_mini_batch: 256
rpn_stride: 16
conv_bn_share_bias: True
roi_pooling_config {
pool_size: 7
pool_size_2x: False
}
all_projections: True
use_pooling:False
}
training_config {
kitti_data_config {
data_sources: {
tfrecords_path: “/home/zhimengf/zhimengf_ws/pascal_voc/kitti_random_split/training/tfrecords/kitti_trainval/kitti_trainval*”
image_directory_path: “/home/zhimengf/zhimengf_ws/pascal_voc/kitti_random_split/training”
}
image_extension: ‘png’
target_class_mapping {
key: ‘car’
value: ‘car’
}
target_class_mapping {
key: ‘van’
value: ‘car’
}
target_class_mapping {
key: ‘pedestrian’
value: ‘person’
}
target_class_mapping {
key: ‘person_sitting’
value: ‘person’
}
target_class_mapping {
key: ‘cyclist’
value: ‘cyclist’
}
validation_fold: 0
}
data_augmentation {
preprocessing {
output_image_width: 1248
output_image_height: 384
output_image_channel: 3
min_bbox_width: 1.0
min_bbox_height: 1.0
}
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 0
translate_max_y: 0
}
color_augmentation {
hue_rotation_max: 0.0
saturation_shift_max: 0.0
contrast_scale_max: 0.0
contrast_center: 0.5
}
}
enable_augmentation: True
batch_size_per_gpu: 1
num_epochs: 12
pretrained_weights: “/home/zhimengf/zhimengf_ws/pascal_voc/resnet_101.hdf5”
output_model: “/home/zhimengf/zhimengf_ws/frcnn_2/train_135/frcnn_kitti_resnet101.tlt”
rpn_min_overlap: 0.3
rpn_max_overlap: 0.7
classifier_min_overlap: 0.0
classifier_max_overlap: 0.5
gt_as_roi: False
std_scaling: 1.0
classifier_regr_std {
key: ‘x’
value: 10.0
}
classifier_regr_std {
key: ‘y’
value: 10.0
}
classifier_regr_std {
key: ‘w’
value: 5.0
}
classifier_regr_std {
key: ‘h’
value: 5.0
}

rpn_mini_batch: 256
rpn_pre_nms_top_N: 12000
rpn_nms_max_boxes: 2000
rpn_nms_overlap_threshold: 0.7

reg_config {
reg_type: ‘L2’
weight_decay: 1e-4
}

optimizer {
sgd {
lr: 0.02
momentum: 0.9
decay: 0.0
nesterov: False
}
}

lr_scheduler {
soft_start {
base_lr: 0.02
start_lr: 0.002
soft_start: 0.1
annealing_points: 0.8
annealing_points: 0.9
annealing_divider: 10.0
}
}

lambda_rpn_regr: 1.0
lambda_rpn_class: 1.0
lambda_cls_regr: 1.0
lambda_cls_class: 1.0

inference_config {
images_dir: ‘/home/zhimengf/zhimengf_ws/pascal_voc/kitti_random_split/testing/image_2’
model: ‘/home/zhimengf/zhimengf_ws/frcnn_2/train_135/frcnn_kitti_resnet101.epoch12.tlt’
detection_image_output_dir: ‘/home/zhimengf/zhimengf_ws/frcnn_2/train_135/inference_results_imgs’
labels_dump_dir: ‘/home/zhimengf/zhimengf_ws/frcnn_2/train_135/inference_dump_labels’
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
bbox_visualize_threshold: 0.6
classifier_nms_max_boxes: 300
classifier_nms_overlap_threshold: 0.3
}

evaluation_config {
model: ‘/home/zhimengf/zhimengf_ws/frcnn_2/train_135/frcnn_kitti_resnet101.epoch12.tlt’
labels_dump_dir: ‘/home/zhimengf/zhimengf_ws/frcnn_2/train_135/test_dump_labels’
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
classifier_nms_max_boxes: 300
classifier_nms_overlap_threshold: 0.3
object_confidence_thres: 0.0001
use_voc07_11point_metric:False
}

}

Thanks, I’ll try tonight. Can mAP go higher if you increase the object_confidence_thres in your config? I already had mAP = 0.6996 with ResNet50 and threshold of 0.50

That’s possible when change the threshold during evaluation .

I lowered the threshold to 0.01 for ResNet50 and get the following results:

================================================================================
Class               AP                  precision           recall              RPN_recall          
--------------------------------------------------------------------------------
cyclist             0.6507              0.3618              0.7406              0.9151              
--------------------------------------------------------------------------------
car                 0.8540              0.8095              0.8683              0.9846              
--------------------------------------------------------------------------------
person              0.6051              0.4963              0.6685              0.9013              
--------------------------------------------------------------------------------
mAP = 0.7032 

Slightly higher recalls and lower precisions, giving almost the same mAP as with threshold = 0.50.

So I doubt for ResNet101, the threshold will make a big change in the mAP. I think I’ll take a break and come back to this later.

Thanks for all your efforts.

Hello,
I noticed that DeepStream 5.0.1 was released a couple of weeks ago, and TLT has also been updated. Does this mean that Faster RCNN with ResNet-101 is training well and gives good Average Precisions now? Thanks

As of today, the latest tlt docker is nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3.
See Transfer Learning Toolkit for Video Streaming Analytics | NVIDIA NGC