GRAYSCALE as image_type not working with tlt-train faster_rcnn

Hello,

I am playing around with the transfer learning docker image (nvcr.io/nvidia/tlt-streamanalytics v1.0.1_py2).

I wanted to train a single channel faster_rcnn detector.
Following the specs from the documentation (https://docs.nvidia.com/metropolis/TLT/pdf/Transfer-Learning-Toolkit-Getting-Started-Guide-IVA.pdf, p49-50), the spec file is configured as

random_seed: 42
enc_key: <lesecretkey>
verbose: True
network_config {
input_image_config {
image_type: GRAYSCALE
image_channel_order: 'I'
size_height_width {
height: 448
width: 544
}
image_channel_mean {
    key: 'I'
    value: 116.779
}
image_scaling_factor: 1.0
}
...

Running “tlt-train faster_rcnn -e $SPECS_DIR/frcnn_train.txt”, I crash with the error:

Traceback (most recent call last):
  File "/usr/local/bin/tlt-train-g1", line 8, in <module>
    sys.exit(main())
  File "./common/magnet_train.py", line 30, in main
  File "./faster_rcnn/scripts/train.py", line 49, in main
  File "./faster_rcnn/spec_loader/spec_loader.py", line 55, in load_experiment_spec
  File "./faster_rcnn/spec_loader/spec_loader.py", line 31, in _load_proto
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 693, in Merge
    allow_unknown_field=allow_unknown_field)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 760, in MergeLines
    return parser.MergeLines(lines, message)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 785, in MergeLines
    self._ParseOrMerge(lines, message)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 807, in _ParseOrMerge
    self._MergeField(tokenizer, message)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 932, in _MergeField
    merger(tokenizer, message, field)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 1006, in _MergeMessageField
    self._MergeField(tokenizer, sub_message)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 932, in _MergeField
    merger(tokenizer, message, field)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 1006, in _MergeMessageField
    self._MergeField(tokenizer, sub_message)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 932, in _MergeField
    merger(tokenizer, message, field)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 1061, in _MergeScalarField
    value = tokenizer.ConsumeEnum(field)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 1478, in ConsumeEnum
    raise self.ParseError(str(e))
google.protobuf.text_format.ParseError: 6:13 : 'image_type: GRAYSCALE': Enum type "Experiment.InputImageConfig.ImageType" has no value named GRAYSCALE.

My guess, GRAYSCALE isn’t the right keyword.

Any help would be appreciated.

Regards,
Gabor

From the tlt user guide, the

image_channel_order: ‘rgb’ or ‘bgr’ if image_type is RGB, ‘l’ if image_type is GRAYSCALE

But your spec is

image_channel_order: 'I'

Please modify it to “l”.

Indeed, I overlooked that one.

random_seed: 42
enc_key: <..>
verbose: True
network_config {
input_image_config {
image_type: GRAYSCALE
image_channel_order: 'l'
size_height_width {
height: 448
width: 544
}
    image_channel_mean {
        key: 'l'
        value: 116.779
}
    image_scaling_factor: 1.0
}...

If does not however fix the problem:

...
Traceback (most recent call last):
  File "/usr/local/bin/tlt-train-g1", line 8, in <module>
    sys.exit(main())
  File "./common/magnet_train.py", line 30, in main
  File "./faster_rcnn/scripts/train.py", line 49, in main
  File "./faster_rcnn/spec_loader/spec_loader.py", line 55, in load_experiment_spec
  File "./faster_rcnn/spec_loader/spec_loader.py", line 31, in _load_proto
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 693, in Merge
    allow_unknown_field=allow_unknown_field)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 760, in MergeLines
    return parser.MergeLines(lines, message)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 785, in MergeLines
    self._ParseOrMerge(lines, message)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 807, in _ParseOrMerge
    self._MergeField(tokenizer, message)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 932, in _MergeField
    merger(tokenizer, message, field)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 1006, in _MergeMessageField
    self._MergeField(tokenizer, sub_message)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 932, in _MergeField
    merger(tokenizer, message, field)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 1006, in _MergeMessageField
    self._MergeField(tokenizer, sub_message)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 932, in _MergeField
    merger(tokenizer, message, field)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 1061, in _MergeScalarField
    value = tokenizer.ConsumeEnum(field)
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py", line 1478, in ConsumeEnum
    raise self.ParseError(str(e))
google.protobuf.text_format.ParseError: 6:13 : 'image_type: GRAYSCALE': Enum type "Experiment.InputImageConfig.ImageType" has no value named GRAYSCALE.

Cheers,
Gabor

Sorry for that, there is a typo in user guide. Please use GRAY_SCALE instead, not GRAYSCALE.

Thanks Morganh, that did the trick!

I seem to have made another horrifying realization: none of the pretrained faster_rcnn models provided by
ngc registry model list nvidia/iva/tlt_*_faster_rcnn
are grayscale!

Solutions I can think of:

  1. maybe Nvidia also provides grayscale pretrained .h5 models?
  2. maybe Graph Surgeon or similar tools be used to get a tlt-train compatible single channel input model file out of a color one
  3. just train faster_rcnn on RGB; too much hassle to get it working with GRAY_SCALE
  4. pick another network (SSD, Detectnetv2) that has pretrained grayscale models

Basically, if nvidia has pretrained grayscale models, I will give that a shot. If not, I will train in color.

Much appreciated!
Gabor

Hi gabor,
Sorry for late reply. For your grayscale data, you should run faster-rcnn well with ngc pretrained models.

I think there is an issue with that.

Given the GRAYSCALE config file and using the resnet18 pretrained backbone:
pretrained_weights: "/workspace/data/faster_rcnn/resnet18.h5"

…training fails to start with:

    2020-03-30 10:53:10,203 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loading pretrained weights from /workspace/data/faster_rcnn/resnet18.h5
    Traceback (most recent call last):
      File "/usr/local/bin/tlt-train-g1", line 8, in <module>
        sys.exit(main())
      File "./common/magnet_train.py", line 30, in main
      File "./faster_rcnn/scripts/train.py", line 232, in main
      File "/usr/local/lib/python2.7/dist-packages/keras/engine/network.py", line 1163, in load_weights
        reshape=reshape)
      File "/usr/local/lib/python2.7/dist-packages/keras/engine/saving.py", line 1149, in load_weights_from_hdf5_group_by_name
        str(weight_values[i].shape) + '.')
    ValueError: Layer #1 (named "conv1"), weight <tf.Variable 'conv1/kernel:0' shape=(7, 7, 1, 64) dtype=float32_ref> has shape (7, 7, 1, 64), but the saved weight has shape (64, 3, 7, 7).

So the backbone expects a 3 channel input image, but the input images are single channel grayscale.

I did not get the chance to brute force all backbones in combination with faster_rcnn training (https://authn.nvidia.com/token is down at the moment), but it seems to me none of the backbones work for grayscale input.

Is there a workaround maybe? I think training GRAY_SCALE with faster_rcnn was just never really tested out and there are no ngc distributed backbones compatible with it.

Gabor

Let me check your error log and try to reproduce it.

Hi gabor,
Could you please paste your training spec here? Thanks

Thanks for taking the time to look into this!

File upload not authorized for .txt files, so here goes the spec file:

random_seed: 42
enc_key: 'fill_in_something'
verbose: True
network_config {
input_image_config {
image_type: GRAY_SCALE
image_channel_order: 'l'
size_height_width {
height: 448
width: 544
}
    image_channel_mean {
        key: 'l'
        value: 116.779
}
    image_scaling_factor: 1.0
}
feature_extractor: "resnet:18"
anchor_box_config {
scale: 64.0
scale: 128.0
scale: 256.0
ratio: 1.0
ratio: 0.5
ratio: 2.0
}
freeze_bn: True
freeze_blocks: 0
freeze_blocks: 1
roi_mini_batch: 256
rpn_stride: 16
conv_bn_share_bias: False
roi_pooling_config {
pool_size: 7
pool_size_2x: False
}
all_projections: True
use_pooling:False
}
training_config {
kitti_data_config {
images_dir : '/workspace/data2/training/images'
labels_dir: '/workspace/data2/training/labels'
}
training_data_parser: 'raw_kitti'
data_augmentation {
use_augmentation: True
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 0
translate_max_y: 0
}
color_augmentation {
color_shift_stddev: 0.0
hue_rotation_max: 0.0
saturation_shift_max: 0.0
contrast_scale_max: 0.0
contrast_center: 0.5
}
}
num_epochs: 12
class_mapping {
key: "CAR"
value: 0
}
class_mapping {
key: "VAN"
value: 1
}
class_mapping {
key: "LGT"
value: 2
}
class_mapping {
key: "HVT"
value: 3
}
class_mapping {
key: "TRUCK"
value: 3
}
class_mapping {
key: "TRAIN"
value: 3
}
class_mapping {
key: "BUS"
value: 4
}
class_mapping {
key: "MTB"
value: 5
}
class_mapping {
key: "MOTORBIKE"
value: 5
}
class_mapping {
key: "background"
value: 6
}
class_mapping {
key: "T01"
value: -1
}
pretrained_weights: "/workspace/data/faster_rcnn/resnet18.h5"
pretrained_model: ""
output_weights: "/workspace/data/faster_rcnn/frcnn_ff.tltw"
output_model: "/workspace/data/faster_rcnn/frcnn_ff.tlt"
rpn_min_overlap: 0.3
rpn_max_overlap: 0.7
classifier_min_overlap: 0.0
classifier_max_overlap: 0.5
gt_as_roi: False
std_scaling: 1.0
classifier_regr_std {
key: 'x'
value: 10.0
}
classifier_regr_std {
key: 'y'
value: 10.0
}
classifier_regr_std {
key: 'w'
value: 5.0
}
classifier_regr_std {
key: 'h'
value: 5.0
}
rpn_mini_batch: 256
rpn_pre_nms_top_N: 12000
rpn_nms_max_boxes: 2000
rpn_nms_overlap_threshold: 0.7
reg_config {
reg_type: 'L2'
weight_decay: 1e-4
}
optimizer {
adam {
lr: 0.00001
beta_1: 0.9
beta_2: 0.999
decay: 0.0
}
}
lr_scheduler {
step {
base_lr: 0.00001
gamma: 1.0
step_size: 30
}
}
lambda_rpn_regr: 1.0
lambda_rpn_class: 1.0
lambda_cls_regr: 1.0
lambda_cls_class: 1.0
inference_config {
images_dir: '/workspace/data2/testing/images'
model: '/workspace/data/faster_rcnn/frcnn_ff.epoch12.tlt'
detection_image_output_dir: '/workspace/data2/inference/images'
labels_dump_dir: '/workspace/data2/inference/labels'
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
bbox_visualize_threshold: 0.6
classifier_nms_max_boxes: 300
classifier_nms_overlap_threshold: 0.3
}
evaluation_config {
dataset {
images_dir : '/workspace/data2/training/images'
labels_dir: '/workspace/data2/training/labels'
}
data_parser: 'raw_kitti'
model: '/workspace/data/faster_rcnn/frcnn_ff.epoch12.tlt'
labels_dump_dir: '/workspace/data2/inference/test_labels'
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
classifier_nms_max_boxes: 300
classifier_nms_overlap_threshold: 0.3
object_confidence_thres: 0.0001
use_voc07_11point_metric:False
}
}

This is pretty much the faster_rcnn kitti template spec file, swapped out for grayscale.
If you launch it with
tlt-train faster_rcnn -e frcnn_fullframe_train.txt
I would expect it reproduces the error:

2020-03-31 08:24:15,770 [INFO] /usr/local/lib/python2.7/dist-packages/iva/faster_rcnn/scripts/train.pyc: Loading pretrained weights from /workspace/data/faster_rcnn/resnet18.h5
Traceback (most recent call last):
  File "/usr/local/bin/tlt-train-g1", line 8, in <module>
    sys.exit(main())
  File "./common/magnet_train.py", line 30, in main
  File "./faster_rcnn/scripts/train.py", line 232, in main
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/network.py", line 1163, in load_weights
    reshape=reshape)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/saving.py", line 1149, in load_weights_from_hdf5_group_by_name
    str(weight_values[i].shape) + '.')
ValueError: Layer #1 (named "conv1"), weight <tf.Variable 'conv1/kernel:0' shape=(7, 7, 1, 64) dtype=float32_ref> has shape (7, 7, 1, 64), but the saved weight has shape (64, 3, 7, 7).

You don’t need any actual images/labels, spec files crashes before it reads any of the images from disk.

Let me know if it gives you the same error.

Thanks for the info. Our internal team will check. Will update it if there is some findings.

Hi gabor,
The pre-trained models in ngc are all for 3-channels. So, that is the reason why faster-rcnn meets error with gray_scale dataset.
We will implement 1-channel pre-trained models in ngc. Please expect months later since this is a new feature request for development team.

Thanks for the info!