google.protobuf.text_format.ParseError: 1:1 : Message type "Experiment" has no field named "Skip"

Hi, Im getting the following error messge when trying to train a faster_rccn model with TLT. Its almost impossible for me to debug it, so i was hoping someone could help me understand the error.

Traceback (most recent call last):
File “/usr/local/bin/tlt-train-g1”, line 10, in
sys.exit(main())
File “./common/magnet_train.py”, line 30, in main
File “./faster_rcnn/scripts/train.py”, line 49, in main
File “./faster_rcnn/spec_loader/spec_loader.py”, line 55, in load_experiment_spec
File “./faster_rcnn/spec_loader/spec_loader.py”, line 31, in _load_proto
File “/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py”, line 690, in Merge
allow_unknown_field=allow_unknown_field)
File “/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py”, line 757, in MergeLines
return parser.MergeLines(lines, message)
File “/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py”, line 782, in MergeLines
self._ParseOrMerge(lines, message)
File “/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py”, line 804, in _ParseOrMerge
self._MergeField(tokenizer, message)
File “/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py”, line 896, in _MergeField
(message_descriptor.full_name, name))
google.protobuf.text_format.ParseError: 1:1 : Message type “Experiment” has no field named “Skip”.

Are you using the training spec file included in the examples dirctory or have you modified it? Can you paste it here?

Hi nrsw5c06,
Could you please attach your training spec file along with your training command and full log? Thanks.

Thanks. I have attached everything below.

My training command:

tlt-train faster_rcnn -e /workspace/data/models/tlt_resnet50_faster_rcnn_v1/train.txt -r /workspace/data/models/tlt_resnet50_faster_rcnn/trained/2020-25-03 --gpus 1 -k KEY

My training spec (Which i forked from the detecnet example and modified for paths and dataset config):

model_config {
pretrained_model_file: “/workspace/data/models/tlt_resnet50_faster_rcnn_v1/resnet50.hdf5”
num_layers: 50
use_batch_norm: true
activation {
activation_type: “relu”
}
dropout_rate: 0.1
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
training_precision {
backend_floatx: FLOAT32
}
arch: “resnet”
}

bbox_rasterizer_config {
target_class_config {
key: “vehicle”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.4
cov_radius_y: 0.4
bbox_min_radius: 1.0
}
}
target_class_config {
key: “person”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}

deadzone_radius: 0.67
}

postprocessing_config {
target_class_config {
key: “vehicle”
value {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.15
dbscan_min_samples: 0.05
minimum_bounding_box_height: 20
}
}
}
target_class_config {
key: “person”
value {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.15
dbscan_min_samples: 0.05
minimum_bounding_box_height: 20
}
}
}
}
cost_function_config {
target_classes {
name: “vehicle”
class_weight: 1.0
coverage_foreground_weight: 0.05
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
target_classes {
name: “person”
class_weight: 1.0
coverage_foreground_weight: 0.05
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
enable_autoweighting: true
max_objective_weight: 0.9999
min_objective_weight: 0.0001
}

training_config {
batch_size_per_gpu: 16
num_epochs: 120
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-6
max_learning_rate: 5e-4
soft_start: 0.1
annealing: 0.7
}
}
regularizer {
type: L1
weight: 3e-9
}
optimizer {
adam {
epsilon: 1e-08
beta1: 0.9
beta2: 0.999
}
}
cost_scaling {
enabled: False
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
checkpoint_interval: 5
}

augmentation_config {
preprocessing {
output_image_width: 768
output_image_height: 768
min_bbox_width: 2.0
min_bbox_height: 2.0
output_image_channel: 3
}
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.5
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.2
contrast_scale_max: 0.1
contrast_center: 0.5
}
}

evaluation_config {
validation_period_during_training: 5
first_validation_epoch: 45
minimum_detection_ground_truth_overlap {
key: “vehicle”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “person”
value: 0.5
}

evaluation_box_config {
key: “vehicle”
value {
minimum_height: 4
maximum_height: 9999
minimum_width: 4
maximum_width: 9999
}
}
evaluation_box_config {
key: “person”
value {
minimum_height: 4
maximum_height: 9999
minimum_width: 4
maximum_width: 9999
}
}
average_precision_mode: INTEGRATE
}

dataset_config {
data_sources {
tfrecords_path: “/workspace/data/records/kitti2*”
image_directory_path: “/data/stanford/kitti2”
}
image_extension: “jpg”
target_class_mapping {
key: “bus”
value: “vehicle”
}
target_class_mapping {
key: “cart”
value: “vehicle”
}
target_class_mapping {
key: “pedestrian”
value: “person”
}
target_class_mapping {
key: “biker”
value: “person”
}
target_class_mapping {
key: “skater”
value: “person”
}
target_class_mapping {
key: “car”
value: “vehicle”
}
validation_fold: 0
}

Th full log:

Using TensorFlow backend.
2020-03-26 08:22:43.480895: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-03-26 08:22:44.349956: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-26 08:22:44.365751: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x666adc0 executing computations on platform CUDA. Devices:
2020-03-26 08:22:44.365792: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2020-03-26 08:22:44.770783: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2020-03-26 08:22:44.771152: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6781d30 executing computations on platform Host. Devices:
2020-03-26 08:22:44.771201: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
2020-03-26 08:22:44.771414: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:04.0
totalMemory: 15.90GiB freeMemory: 15.64GiB
2020-03-26 08:22:44.771442: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-03-26 08:22:44.781802: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-26 08:22:44.781838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2020-03-26 08:22:44.781865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2020-03-26 08:22:44.781955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15216 MB memory) → physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
Traceback (most recent call last):
File “/usr/local/bin/tlt-train-g1”, line 10, in
sys.exit(main())
File “./common/magnet_train.py”, line 30, in main
File “./faster_rcnn/scripts/train.py”, line 49, in main
File “./faster_rcnn/spec_loader/spec_loader.py”, line 55, in load_experiment_spec
File “./faster_rcnn/spec_loader/spec_loader.py”, line 31, in _load_proto
File “/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py”, line 690, in Merge
allow_unknown_field=allow_unknown_field)
File “/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py”, line 757, in MergeLines
return parser.MergeLines(lines, message)
File “/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py”, line 782, in MergeLines
self._ParseOrMerge(lines, message)
File “/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py”, line 804, in _ParseOrMerge
self._MergeField(tokenizer, message)
File “/usr/local/lib/python2.7/dist-packages/google/protobuf/text_format.py”, line 896, in _MergeField
(message_descriptor.full_name, name))
google.protobuf.text_format.ParseError: 1:1 : Message type “Experiment” has no field named “Skip”.

Yes. I used the one from the examples directory and modified the paths and dataset config. I have pasted it in my answer to Morganh.

Hi nrsw5c06,
Your training spec aims to detectnet_v2 instead of faster_rcnn.
For faster_rcnn, please read the tlt user guide or refer to the default spec inside tlt 1.0.1 container.

Okay. Figured i had the wrong spec file when you asked for it.

Thank you for helping