Yolov4-tiny Training using cspdarknet_tiny.hdf5 for pre-trained model

• Hardware : Nvidia GeForce GTX 1060
• Network Type Yolov4-tiny
• TLT Version: TAO 3.21.11
• Training spec file(If have, please share here)

• How to reproduce the issue ?
Execute “tao yolo_v4_tiny train {parms}”, gives following error, indicating that wrong key is used, although that is what was provided with Jupyter sample:
Invalid decryption. Unable to open file (unable to open file: name = ‘EXPERIMENT_DIR/pretrained_cspdarknet_tiny/pretrained_object_detection_vcspdarknet_tiny/cspdarknet_tiny.hdf5’, errno = 2, error message = ‘No such file or directory’, flags = 0, o_flags = 0). The key used to load the model is incorrect.
2021-12-02 17:28:11,437 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Can you share the full command when you execute "“tao yolo_v4_tiny train {parms}” ?
BTW, what is the key?

The Key is set by this environment variable: %env KEY=nvidia_tlt

The full command is:
!tao yolo_v4_tiny train -e $SPECS_DIR/yolo_v4_tiny_train_kitti_seq.txt
-r $USER_EXPERIMENT_DIR/experiment_dir_unpruned
-k $KEY
–gpus 1

And here is my spec file:
random_seed: 42
yolov4_config {
big_anchor_shape: “[(153.33, 122.31), (168.00, 213.46), (306.08, 346.00)]”
mid_anchor_shape: “[(27.27, 50.00), (48.00, 93.00), (73.33, 184.62)]”
box_matching_iou: 0.5
matching_neutral_box_iou: 0.5
arch: “cspdarknet_tiny”
loss_loc_weight: 1.0
loss_neg_obj_weights: 1.0
loss_class_weights: 1.0
label_smoothing: 0.0
big_grid_xy_extend: 0.05
mid_grid_xy_extend: 0.05
freeze_bn: false
#freeze_blocks: 0
force_relu: false
}
training_config {
batch_size_per_gpu: 8
num_epochs: 100
enable_qat: false
checkpoint_interval: 10
learning_rate {
soft_start_cosine_annealing_schedule {
min_learning_rate: 1e-7
max_learning_rate: 1e-4
soft_start: 0.3
}
}
regularizer {
type: L1
weight: 3e-5
}
optimizer {
adam {
epsilon: 1e-7
beta1: 0.9
beta2: 0.999
amsgrad: false
}
}
pretrain_model_path: “EXPERIMENT_DIR/pretrained_cspdarknet_tiny/pretrained_object_detection_vcspdarknet_tiny/cspdarknet_tiny.hdf5”
}
eval_config {
average_precision_mode: SAMPLE
batch_size: 8
matching_iou_threshold: 0.5
}
nms_config {
confidence_threshold: 0.001
clustering_iou_threshold: 0.5
top_k: 200
}
augmentation_config {
hue: 0.1
saturation: 1.5
exposure:1.5
vertical_flip:0
horizontal_flip: 0.5
jitter: 0.3
output_width: 640
output_height: 480
output_channel: 3
randomize_input_shape_period: 10
mosaic_prob: 0.5
mosaic_min_ratio:0.2
}
dataset_config {
data_sources: {
label_directory_path: “/workspace/tao-experiments/data/training/label”
image_directory_path: “/workspace/tao-experiments/data/training/image”
}
include_difficult_in_training: true
target_class_mapping {
key: “vehicle”
value: “vehicle”
}
target_class_mapping {
key: “person”
value: “person”
}
target_class_mapping {
key: “animal”
value: “animal”
}
validation_data_sources: {
label_directory_path: “/workspace/tao-experiments/data/val/label”
image_directory_path: “/workspace/tao-experiments/data/val/image”
}
}

I am afraid above link is not available.
You can double check with below command.
$ tao yolov4 run ls EXPERIMENT_DIR/pretrained_cspdarknet_tiny/pretrained_object_detection_vcspdarknet_tiny/cspdarknet_tiny.hdf5

Thanks Morgan,

Looks like although file is in correct folder on local, It’s not finding it in the container. Will double check mount mapping

Thanks again Morgan for your assistance, it’s working now.

My problem was that I tried to use the env variable EXPERIMENT_DIR in my spec file which does not work. Changing this to the full container path solved the problem.

1 Like

Also you can modify to $EXPERIMENT_DIR

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.