Training Custom FasterRCNN resnet50 Object detection issue

Hello,

I am attempting to train a new model with my sample data to get a better understanding of how to use TLT.

I am getting the following error when I run the train command:

Total params: 42,937,262
Trainable params: 42,506,158
Non-trainable params: 431,104
__________________________________________________________________________________________________
2019-10-25 19:00:32,725 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loading pretrained weights from /workspace/tlt-experiments/pretrained_models/tlt_resnet50_faster_rcnn_v1/resnet50.h5
2019-10-25 19:00:34,734 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Pretrained weights loaded!
2019-10-25 19:00:34,965 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: training example num: 138
2019-10-25 19:00:35,181 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Starting training
2019-10-25 19:00:35,181 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Epoch 1/12
Found 138 examples in training dataset, valid image extension isjpg, jpeg and png(case sensitive)

Compressed_class_mapping: {u'pore': 0}

Name mapping:{u'pore': u'pore'}

Training dataset stats(compressed via class mapping):

{u'pore': 202}


Traceback (most recent call last):
  File "/usr/local/bin/tlt-train-g1", line 10, in <module>
    sys.exit(main())
  File "./common/magnet_train.py", line 30, in main
  File "./faster_rcnn/scripts/train.py", line 273, in main
  File "./faster_rcnn/data_loader/loader.py", line 100, in kitti_data_gen
AssertionError: Class pore 0.00 unrecognized in /workspace/tlt-experiments/data/cam/training_labels/Image__2019-07-25__09-09-04.txt

Here is my spec:

random_seed: 42
enc_key: 'APIKEY'
verbose: True
network_config {
input_image_config {
image_type: RGB
image_channel_order: 'bgr'
size_height_width {
height: 1184
width: 1920
}
    image_channel_mean {
        key: 'b'
        value: 103.939
}
    image_channel_mean {
        key: 'g'
        value: 116.779
}
    image_channel_mean {
        key: 'r'
        value: 123.68
}
    image_scaling_factor: 1.0
}
feature_extractor: "resnet:50"
anchor_box_config {
scale: 64.0
scale: 128.0
scale: 256.0
ratio: 1.0
ratio: 0.5
ratio: 2.0
}
freeze_bn: True
freeze_blocks: 0
freeze_blocks: 1
roi_mini_batch: 256
rpn_stride: 16
conv_bn_share_bias: True
roi_pooling_config {
pool_size: 7
pool_size_2x: False
}
all_projections: True
use_pooling:False
}
training_config {
kitti_data_config {
images_dir : '/workspace/tlt-experiments/data/cam/training_images'
labels_dir: '/workspace/tlt-experiments/data/cam/training_labels'
}
training_data_parser: 'raw_kitti'
data_augmentation {
use_augmentation: True
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 0
translate_max_y: 0
}
color_augmentation {
color_shift_stddev: 0.0
hue_rotation_max: 0.0
saturation_shift_max: 0.0
contrast_scale_max: 0.0
contrast_center: 0.5
}
}
num_epochs: 12
class_mapping {
key: "pore"
value: 0
}

pretrained_weights: "/workspace/tlt-experiments/pretrained_models/tlt_resnet50_faster_rcnn_v1/resnet50.h5"
pretrained_model: ""
output_weights: "/workspace/tlt-experiments/faster_rcnn/frcnn_kitti.tltw"
output_model: "/workspace/tlt-experiments/faster_rcnn/frcnn_kitti.tlt"
rpn_min_overlap: 0.3
rpn_max_overlap: 0.7
classifier_min_overlap: 0.0
classifier_max_overlap: 0.5
gt_as_roi: False
std_scaling: 1.0
classifier_regr_std {
key: 'x'
value: 10.0
}
classifier_regr_std {
key: 'y'
value: 10.0
}
classifier_regr_std {
key: 'w'
value: 5.0
}
classifier_regr_std {
key: 'h'
value: 5.0
}

rpn_mini_batch: 256
rpn_pre_nms_top_N: 12000
rpn_nms_max_boxes: 2000
rpn_nms_overlap_threshold: 0.7

reg_config {
reg_type: 'L2'
weight_decay: 1e-4
}

optimizer {
adam {
lr: 0.00001
beta_1: 0.9
beta_2: 0.999
decay: 0.0
}
}

lr_scheduler {
step {
base_lr: 0.00001
gamma: 1.0
step_size: 30
}
}

lambda_rpn_regr: 1.0
lambda_rpn_class: 1.0
lambda_cls_regr: 1.0
lambda_cls_class: 1.0

inference_config {
images_dir: '/workspace/tlt-experiments/data/cam/validation'
model: '/workspace/tlt-experiments/faster_rcnn/frcnn_kitti.epoch12.tlt'
detection_image_output_dir: '/workspace/tlt-experiments/data/cam/inference_results_imgs'
labels_dump_dir: '/workspace/tlt-experiments/data/cam/inference_dump_labels'
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
bbox_visualize_threshold: 0.6
classifier_nms_max_boxes: 300
classifier_nms_overlap_threshold: 0.3
}

evaluation_config {
dataset {
images_dir : '/workspace/tlt-experiments/data/cam/validation_images'
labels_dir: '/workspace/tlt-experiments/data/cam/validation_labels'
}
data_parser: 'raw_kitti'
model: '/workspace/tlt-experiments/faster_rcnn/frcnn_kitti.epoch12.tlt'
labels_dump_dir: '/workspace/tlt-experiments/data/cam/test_dump_labels'
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
classifier_nms_max_boxes: 300
classifier_nms_overlap_threshold: 0.3
object_confidence_thres: 0.0001
use_voc07_11point_metric:False
}

}

Here is the specific image and its KITTI labeling:

pore 0.00 0 0.0 452.00 415.00 496.00 458.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

Please let me know if there is some issue I am not seeing.

Thanks!

Hi Martin,
There is an additional 0.0 in your pole label text file. Please remove it and try again.
The sum of total number of elements per object is 15. See 4.2.2 in tlt doc for more details.

Hi Morgan,

Great catch! Is there a specific 0.0 I should remove to fit with the Nvidia tlt format?

Thanks so much for such a quick response!

Hi Martin,
The sum of total number of elements per object is 15. But yours is 16.
So the problem is due to wrong format of the label text file.

Hello Morgan,

I have resolved that issue. thank you!

I now cannot train because I run into the following issue. Please advise me on what may be the problem. (please note, the config.txt remains the same as above.)

2019-10-28 12:27:29,722 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loading pretrained weights from /workspace/tlt-experiments/pretrained_models/tlt_resnet50_faster_rcnn_v1/resnet50.h5
2019-10-28 12:27:31,766 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Pretrained weights loaded!
2019-10-28 12:27:32,001 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: training example num: 139
2019-10-28 12:27:32,248 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Starting training
2019-10-28 12:27:32,248 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Epoch 1/12
Found 139 examples in training dataset, valid image extension isjpg, jpeg and png(case sensitive)

Compressed_class_mapping: {u'pore': 0}

Name mapping:{u'pore': u'pore'}

Training dataset stats(compressed via class mapping):

{u'pore': 203}

No positive ROIs.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-10-28 12:27:41,148 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Traceback (most recent call last):
  File "/usr/local/bin/tlt-train-g1", line 10, in <module>
    sys.exit(main())
  File "./common/magnet_train.py", line 30, in main
  File "./faster_rcnn/scripts/train.py", line 309, in main
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1216, in train_on_batch
    self._make_train_function()
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 509, in _make_train_function
    loss=self.total_loss)
  File "/usr/local/lib/python2.7/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/keras/optimizers.py", line 505, in get_updates
    self.updates.append(K.update(m, m_t))
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 973, in update
    return tf.assign(x, new_x)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/state_ops.py", line 224, in assign
    return ref.assign(value, name=name)
AttributeError: 'Tensor' object has no attribute 'assign'

Never mind, solved the issue above. It had to do with the sizing of the images.

What does “No positive ROIs” mean?

Also I am getting an indexing error: IndexError: index 1 is out of bounds for axis 1 with size 1

42/47 [=========================>....] - ETA: 2s - rpn_cls: 0.0259 - rpn_regr: 0.0192 - detector_cls: 2.1542e-04 - detector_regr: 0.0000e+00No positive ROIs.
43/47 [==========================>...] - ETA: 1s - rpn_cls: 0.0259 - rpn_regr: 0.0192 - detector_cls: 2.1338e-04 - detector_regr: 0.0000e+00No positive ROIs.
44/47 [===========================>..] - ETA: 1s - rpn_cls: 0.0259 - rpn_regr: 0.0192 - detector_cls: 2.1142e-04 - detector_regr: 0.0000e+00No positive ROIs.
45/47 [===========================>..] - ETA: 0s - rpn_cls: 0.0258 - rpn_regr: 0.0192 - detector_cls: 2.0948e-04 - detector_regr: 0.0000e+00No positive ROIs.
46/47 [============================>.] - ETA: 0s - rpn_cls: 0.0258 - rpn_regr: 0.0192 - detector_cls: 2.0756e-04 - detector_regr: 0.0000e+00No positive ROIs.
47/47 [==============================] - 22s 459ms/step - rpn_cls: 0.0257 - rpn_regr: 0.0192 - detector_cls: 2.0567e-04 - detector_regr: 0.0000e+00
2019-10-28 13:11:28,985 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Mean number of bounding boxes from RPN overlapping ground truth boxes: 0.0
2019-10-28 13:11:28,985 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Classifier accuracy for bounding boxes from RPN: 1.0
2019-10-28 13:11:28,985 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loss RPN classifier: 0.0238573137433
2019-10-28 13:11:28,985 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loss RPN regression: 0.0191653343116
2019-10-28 13:11:28,985 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loss Detector classifier: 0.000118822497004
2019-10-28 13:11:28,985 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loss Detector regression: 0.0
2019-10-28 13:11:28,985 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Elapsed time: 25.4148280621
2019-10-28 13:11:28,985 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Total loss changed from 0.0628707147347 to 0.0431414705519, saving weights
2019-10-28 13:11:32,846 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Epoch 4/12
No positive ROIs.
 1/47 [..............................] - ETA: 23s - rpn_cls: 0.0254 - rpn_regr: 0.0359 - detector_cls: 1.1921e-07 - detector_regr: 0.0000e+00Traceback (most recent call last):
  File "/usr/local/bin/tlt-train-g1", line 10, in <module>
    sys.exit(main())
  File "./common/magnet_train.py", line 30, in main
  File "./faster_rcnn/scripts/train.py", line 287, in main
  File "./faster_rcnn/utils/roi_helpers.py", line 76, in calc_iou_np
IndexError: index 1 is out of bounds for axis 1 with size 1

Here is my config for you to check once more:

random_seed: 42
enc_key: 'APIKEY'
verbose: True
network_config {
input_image_config {
image_type: RGB
image_channel_order: 'bgr'
    size_min {
min:700
}
    image_channel_mean {
        key: 'b'
        value: 103.939
}
    image_channel_mean {
        key: 'g'
        value: 116.779
}
    image_channel_mean {
        key: 'r'
        value: 123.68
}
    image_scaling_factor: 1.0
}
feature_extractor: "resnet:50"
anchor_box_config {
scale: 64.0
scale: 128.0
scale: 256.0
ratio: 1.0
ratio: 0.5
ratio: 2.0
}
freeze_bn: True
freeze_blocks: 0
freeze_blocks: 1
roi_mini_batch: 256
rpn_stride: 16
conv_bn_share_bias: True
roi_pooling_config {
pool_size: 7
pool_size_2x: False
}
all_projections: True
use_pooling:False
}
training_config {
kitti_data_config {
images_dir : '/workspace/tlt-experiments/data/cam/training_images'
labels_dir: '/workspace/tlt-experiments/data/cam/training_labels'
}
training_data_parser: 'raw_kitti'
data_augmentation {
use_augmentation: True
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 0
translate_max_y: 0
}
color_augmentation {
color_shift_stddev: 0.0
hue_rotation_max: 0.0
saturation_shift_max: 0.0
contrast_scale_max: 0.0
contrast_center: 0.5
}
}
num_epochs: 12
class_mapping {
key: 'pore'
value: 0
}
class_mapping {
key: "paint"
value: 1
}

pretrained_weights: "/workspace/tlt-experiments/pretrained_models/tlt_resnet50_faster_rcnn_v1/resnet50.h5"
pretrained_model: ""
output_weights: "/workspace/tlt-experiments/faster_rcnn/frcnn_kitti.tltw"
output_model: "/workspace/tlt-experiments/faster_rcnn/frcnn_kitti.tlt"
rpn_min_overlap: 0.3
rpn_max_overlap: 0.7
classifier_min_overlap: 0.0
classifier_max_overlap: 0.5
gt_as_roi: False
std_scaling: 1.0
classifier_regr_std {
key: 'x'
value: 10.0
}
classifier_regr_std {
key: 'y'
value: 10.0
}
classifier_regr_std {
key: 'w'
value: 5.0
}
classifier_regr_std {
key: 'h'
value: 5.0
}

rpn_mini_batch: 256
rpn_pre_nms_top_N: 12000
rpn_nms_overlap_threshold: 0.7

reg_config {
reg_type: 'L2'
weight_decay: 1e-4
}

optimizer {
adam {
lr: 0.00001
beta_1: 0.9
beta_2: 0.999
decay: 0.0
}
}

lr_scheduler {
step {
base_lr: 0.00001
gamma: 1.0
step_size: 30
}
}

lambda_rpn_regr: 1.0
lambda_rpn_class: 1.0
lambda_cls_regr: 1.0
lambda_cls_class: 1.0

inference_config {
images_dir: '/workspace/tlt-experiments/data/cam/inference'
model: '/workspace/tlt-experiments/faster_rcnn/frcnn_kitti.epoch12.tlt'
detection_image_output_dir: '/workspace/tlt-experiments/data/cam/inference_results_imgs'
labels_dump_dir: '/workspace/tlt-experiments/data/cam/inference_results_imgs'
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
bbox_visualize_threshold: 0.6
classifier_nms_max_boxes: 300
classifier_nms_overlap_threshold: 0.3
}

evaluation_config {
dataset {
images_dir : '/workspace/tlt-experiments/data/cam/validation_images'
labels_dir: '/workspace/tlt-experiments/data/cam/validation_labels'
}
data_parser: 'raw_kitti'
model: '/workspace/tlt-experiments/faster_rcnn/frcnn_kitti.epoch12.tlt'
labels_dump_dir: '/workspace/tlt-experiments/data/cam/validation_images'
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
classifier_nms_max_boxes: 300
classifier_nms_overlap_threshold: 0.3
object_confidence_thres: 0.0001
use_voc07_11point_metric:False
}

}

Here is a sample annotation for an image:

pore 0.00 0 0.0 314.00 286.00 322.00 301.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0
paint 0.00 0 0.0 353.00 260.00 409.00 339.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0

[/quote]

[quote]

Hello Morgan,

I have resolved that issue. thank you!

I now cannot train because I run into the following issue. Please advise me on what may be the problem. (please note, the config.txt remains the same as above.)

2019-10-28 12:27:29,722 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loading pretrained weights from /workspace/tlt-experiments/pretrained_models/tlt_resnet50_faster_rcnn_v1/resnet50.h5
2019-10-28 12:27:31,766 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Pretrained weights loaded!
2019-10-28 12:27:32,001 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: training example num: 139
2019-10-28 12:27:32,248 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Starting training
2019-10-28 12:27:32,248 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Epoch 1/12
Found 139 examples in training dataset, valid image extension isjpg, jpeg and png(case sensitive)

Compressed_class_mapping: {u'pore': 0}

Name mapping:{u'pore': u'pore'}

Training dataset stats(compressed via class mapping):

{u'pore': 203}


No positive ROIs.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-10-28 12:27:41,148 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Traceback (most recent call last):
  File "/usr/local/bin/tlt-train-g1", line 10, in <module>
    sys.exit(main())
  File "./common/magnet_train.py", line 30, in main
  File "./faster_rcnn/scripts/train.py", line 309, in main
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1216, in train_on_batch
    self._make_train_function()
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 509, in _make_train_function
    loss=self.total_loss)
  File "/usr/local/lib/python2.7/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/keras/optimizers.py", line 505, in get_updates
    self.updates.append(K.update(m, m_t))
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 973, in update
    return tf.assign(x, new_x)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/state_ops.py", line 224, in assign
    return ref.assign(value, name=name)
AttributeError: 'Tensor' object has no attribute 'assign'

Please ignore the quoted text above.

Please close this thread. The questions here have been resolved due to improper annotations.