Retraining yolov4_tiny

Traceback (most recent call last):
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/scripts/train.py”, line 110, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py”, line 528, in return_func
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py”, line 516, in return_func
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/scripts/train.py”, line 106, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/scripts/train.py”, line 58, in run_experiment
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/models/utils.py”, line 37, in build_training_pipeline
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/models/yolov4_model.py”, line 470, in build_training_model
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/models/yolov4_model.py”, line 305, in load_pretrained_model
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/builders/model_builder.py”, line 52, in _load_pretrain_weights
ValueError: conv_0 has incorrect shape in pretrained model.
/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py:292: UserWarning: No training configuration found in save file: the model was not compiled. Compile it manually.
warnings.warn('No training configuration found in save file: ’
Traceback (most recent call last):
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/builders/model_builder.py”, line 49, in _load_pretrain_weights
File “/usr/local/lib/python3.6/dist-packages/keras/engine/base_layer.py”, line 1057, in set_weights
'provided weight shape ’ + str(w.shape))
ValueError: Layer weight shape (3, 3, 3, 32) not compatible with provided weight shape (3, 3, 3, 24)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/scripts/train.py”, line 110, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py”, line 528, in return_func
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py”, line 516, in return_func
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/scripts/train.py”, line 106, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/scripts/train.py”, line 58, in run_experiment
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/models/utils.py”, line 37, in build_training_pipeline
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/models/yolov4_model.py”, line 470, in build_training_model
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/models/yolov4_model.py”, line 305, in load_pretrained_model
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/builders/model_builder.py”, line 52, in _load_pretrain_weights
ValueError: conv_0 has incorrect shape in pretrained model.

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.


mpirun.real detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[37961,1],1]
Exit code: 1

2022-03-28 12:54:50,501 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Please share command and spec file. Thanks.

tao yolo_v4_tiny train --gpus 2 --gpu_index 0 1
-e $SPECS_DIR/yolo_v4_tiny_retrain_kitti.txt
-r $USER_EXPERIMENT_DIR/experiment_dir_retrain
-k $KEY

***** Retrain Config File ******
random_seed: 42
yolov4_config {
big_anchor_shape: “[(24.00, 175.00),(38.00, 111.00),(24.00, 206.00)]”
mid_anchor_shape: “[(19.00, 126.00),(19.00, 174.00),(23.00, 146.00)]”
small_anchor_shape: “[(17.00, 79.00),(13.00, 151.00),(29.00, 73.00)]”

box_matching_iou: 0.25
matching_neutral_box_iou: 0.5
arch: “cspdarknet_tiny_3l”
loss_loc_weight: 1.0
loss_neg_obj_weights: 1.0
loss_class_weights: 1.0
label_smoothing: 0.0
big_grid_xy_extend: 0.05
mid_grid_xy_extend: 0.05
freeze_bn: false
#freeze_blocks: 0
force_relu: false
}
training_config {
batch_size_per_gpu: 32
num_epochs: 350
enable_qat: true
checkpoint_interval: 30
learning_rate {
soft_start_cosine_annealing_schedule {
min_learning_rate: 1e-7
max_learning_rate: 8e-3
soft_start: 0.3
}
}
regularizer {
type: L1
weight: 3e-5
}
optimizer {
adam {
epsilon: 1e-7
beta1: 0.9
beta2: 0.999
amsgrad: false
}
}
pretrain_model_path: “/workspace/tao-experiments/yolo_v4_tiny/experiment_dir_pruned/yolov4_cspdarknet_tiny_3l_pruned.tlt”
}
eval_config {
average_precision_mode: SAMPLE
batch_size: 1
matching_iou_threshold: 0.6
}
nms_config {
confidence_threshold: 0.001
clustering_iou_threshold: 0.5
force_on_cpu: true
top_k: 200
}
augmentation_config {
hue: 0.1
saturation: 1.5
exposure:1.5
vertical_flip:0
horizontal_flip: 0.5
jitter: 0.3
output_width: 288
output_height: 288
output_channel: 3
randomize_input_shape_period: 10
mosaic_prob: 0.5
mosaic_min_ratio:0.2
}
dataset_config {
data_sources: {
tfrecords_path: “/workspace/tao-experiments/data/training/tfrecords/*”
image_directory_path: “/workspace/tao-experiments/data/training/”
}
include_difficult_in_training: true
image_extension: “jpg”
target_class_mapping {
key: “car”
value: “car”
}
}

*************** The below config is the train config *****************
random_seed: 42
yolov4_config {
big_anchor_shape: “[(24.00, 175.00),(38.00, 111.00),(24.00, 206.00)]”
mid_anchor_shape: “[(19.00, 126.00),(19.00, 174.00),(23.00, 146.00)]”
small_anchor_shape: “[(17.00, 79.00),(13.00, 151.00),(29.00, 73.00)]”

box_matching_iou: 0.25
matching_neutral_box_iou: 0.5
arch: “cspdarknet_tiny_3l”
loss_loc_weight: 1.0
loss_neg_obj_weights: 1.0
loss_class_weights: 1.0
label_smoothing: 0.0
big_grid_xy_extend: 0.05
mid_grid_xy_extend: 0.05
freeze_bn: false
#freeze_blocks: 0
force_relu: false
}
training_config {
batch_size_per_gpu: 32
num_epochs: 350
enable_qat: true
checkpoint_interval: 30
learning_rate {
soft_start_cosine_annealing_schedule {
min_learning_rate: 1e-7
max_learning_rate: 8e-3
soft_start: 0.3
}
}
regularizer {
type: L1
weight: 3e-5
}
optimizer {
adam {
epsilon: 1e-7
beta1: 0.9
beta2: 0.999
amsgrad: false
}
}
pretrain_model_path: “/workspace/tao-experiments/yolo_v4_tiny/pretrained_cspdarknet_tiny/pretrained_object_detection_vcspdarknet_tiny/cspdarknet_tiny.hdf5”
}
eval_config {
average_precision_mode: SAMPLE
batch_size: 1
matching_iou_threshold: 0.6
}
nms_config {
confidence_threshold: 0.001
clustering_iou_threshold: 0.5
force_on_cpu: true
top_k: 200
}
augmentation_config {
hue: 0.1
saturation: 1.5
exposure:1.5
vertical_flip:0
horizontal_flip: 0.5
jitter: 0.3
output_width: 288
output_height: 288
output_channel: 3
randomize_input_shape_period: 10
mosaic_prob: 0.5
mosaic_min_ratio:0.2
}
dataset_config {
data_sources: {
tfrecords_path: “/workspace/tao-experiments/data/training/tfrecords/*”
image_directory_path: “/workspace/tao-experiments/data/training/”
}
include_difficult_in_training: true
image_extension: “jpg”
target_class_mapping {
key: “car”
value: “car”
}
}

Is the error because I used a arch: “cspdarknet_tiny_3l” for a cspdarknet_tiny.hdf5” pretrained model?

Can someone help me quickly!!

Why you set pretrain_model_path during retraining?

It is expected to set pruned_model_path.
See YOLOv4-tiny — TAO Toolkit 3.22.05 documentation

Thank you for identifying the mistake…
While making changes I forgot to change the variable’s name from “pretrain_model_path” to “pruned_model_path” and only changed the model path…

Thank you!!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.