TLT - retrain trafficcamnet with customized data failed

Description

First of all, my own dataset only contains cars, no people, bike etc. Does this dataset work?

The specs I copied from from the othe topic and made adjustment accordingly.

The errors are like below,

2021-06-14 09:17:29,871 [INFO] tensorflow: Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpndu68y5w/model.ckpt-193080
2021-06-14 09:17:30,369 [INFO] tensorflow: Restoring parameters from /tmp/tmpndu68y5w/model.ckpt-193080
Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1365, in _do_call
return fn(*args)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1350, in _run_fn
target_list, run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
(0) Not found: Key block_1b_bn_shortcut/beta/Adam not found in checkpoint
[[{{node save/RestoreV2}}]]
(1) Not found: Key block_1b_bn_shortcut/beta/Adam not found in checkpoint
[[{{node save/RestoreV2}}]]
[[save/RestoreV2/_167]]
0 successful operations.
0 derived errors ignored.

Can someone pinpoint where should I look into?

Thanks,
Kai

Can you share the training spec file and the full training command line?

Thanks for looking into this!

Config file:

random_seed: 42
dataset_config {
data_sources {
tfrecords_path: “/workspace/tlt-experiments/data/tfrecords/kitti_trainval/*”
image_directory_path: “/workspace/tlt-experiments/data/training”
}
image_extension: “PNG”
target_class_mapping {
key: “car”
value: “car”
}
validation_fold: 0
}
model_config {
pretrained_model_file: “/workspace/tlt-experiments/detectnet_v2/pretrained_trafficcamnet/tlt_trafficcamnet_vunpruned_v1.0/resnet18_trafficcamnet.tlt”
num_layers: 18
use_batch_norm: true
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
training_precision {
backend_floatx: FLOAT32
}
arch: “resnet”
all_projections:True
}
augmentation_config {
preprocessing {
output_image_width: 960
output_image_height: 544
crop_right: 960
crop_bottom: 544
min_bbox_width: 1.0
min_bbox_height: 1.0
output_image_channel: 3
}
spatial_augmentation {
hflip_probability: 0.5
zoom_min: 1
zoom_max: 1
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}
postprocessing_config {
target_class_config {
key: “car”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00499999988824
dbscan_eps: 0.20000000298
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 15
}
}
}
}
evaluation_config {
validation_period_during_training: 5
first_validation_epoch: 1
minimum_detection_ground_truth_overlap {
key: “car”
value: 0.699999988079
}
minimum_detection_ground_truth_overlap {
key: “bicycle”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “person”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “road_sign”
value: 0.5
}
evaluation_box_config {
key: “car”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
average_precision_mode: INTEGRATE
}
cost_function_config {
target_classes {
name: “car”
class_weight: 1.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
enable_autoweighting: true
max_objective_weight: 0.999899983406
min_objective_weight: 9.99999974738e-05
}
training_config {
batch_size_per_gpu: 24
num_epochs: 120
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 9e-08
max_learning_rate: 0.0000045
soft_start: 0.10000000149
annealing: 0.699999988079
}
}
regularizer {
type: L1
weight: 3.00000002618e-09
}
optimizer {
adam {
epsilon: 9.99999993923e-09
beta1: 0.899999976158
beta2: 0.999000012875
}
}
cost_scaling {
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
checkpoint_interval: 10
}
bbox_rasterizer_config {
target_class_config {
key: “car”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.40000000596
cov_radius_y: 0.40000000596
bbox_min_radius: 1.0
}
}
deadzone_radius: 0.400000154972
}

Training Command:

!tlt detectnet_v2 train -e $SPECS_DIR/detectnet_v2_retrain_trafficcamnet_car_kitti.txt
-r $USER_EXPERIMENT_DIR/experiment_dir_unpruned
-k $KEY
-n trafficcamnet_detector
–gpus $NUM_GPUS

Since you mention that your own dataset only contains car, please remove other classes’ info in your training spec.

Yes this is my first thought of the root cause. I did removed other classes but the problem still there

Can you use a new result folder? For example,
-r $USER_EXPERIMENT_DIR/experiment_dir_unpruned_new

Reference: Troubleshooting Guide — Transfer Learning Toolkit 3.0 documentation

Yes, the problem is gone. I suspected but I was looking for some tlt command options to have fresh start w/o affaction by previous trainings but I never thought to change to a new output dir. Many thanks for your help!

1 Like