Please provide the following information when requesting support.
• Hardware (T4/V100/Xavier/Nano/etc) : T4
• Network Type : Yolo_v4
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
Configuration of the TLT Instance
dockers: ['nvidia/tlt-streamanalytics', 'nvidia/tlt-pytorch']
format_version: 1.0
tlt_version: 3.0
published_date: 04/16/2021
docker_tag: v3.0-py3
• Training spec file(If have, please share here)
random_seed: 42
yolov4_config {
big_anchor_shape: "[(247.00, 178.00), (193.00, 235.00), (267.00, 299.00), (385.00, 406.00), (642.00, 614.00)]"
mid_anchor_shape: "[(108.00, 85.00), (99.00, 118.00), (146.00, 120.00), (118.00, 151.00), (158.00, 175.00)]"
small_anchor_shape: "[(45.00, 49.00), (54.00, 64.00), (76.00, 66.00), (64.00, 81.00), (80.00, 97.00)]"
box_matching_iou: 0.3
arch: "resnet"
nlayers: 50
arch_conv_blocks: 2
loss_loc_weight: 5.0
loss_neg_obj_weights: 50.0
loss_class_weights: 0.9
label_smoothing: 0.1
big_grid_xy_extend: 0.05
mid_grid_xy_extend: 0.1
small_grid_xy_extend: 0.2
freeze_bn: false
#freeze_blocks: 0
force_relu: false
}
training_config {
batch_size_per_gpu: 1
num_epochs: 10
enable_qat: true
checkpoint_interval: 1
learning_rate {
soft_start_cosine_annealing_schedule {
min_learning_rate: 1e-7
max_learning_rate: 1e-4
soft_start: 0.3
}
}
regularizer {
type: L1
weight: 3e-5
}
optimizer {
adam {
epsilon: 1e-7
beta1: 0.9
beta2: 0.999
amsgrad: false
}
}
pretrain_model_path: "/workspace/tlt-experiments/yolo_v4/pretrained_resnet50/tlt_pretrained_object_detection_vresnet50/resnet_50.hdf5"
}
eval_config {
average_precision_mode: SAMPLE
batch_size: 1
matching_iou_threshold: 0.4
}
nms_config {
confidence_threshold: 0.001
clustering_iou_threshold: 0.4
top_k: 200
}
augmentation_config {
hue: 0.1
saturation: 1.5
exposure:1.5
vertical_flip:0
horizontal_flip: 0.5
jitter: 0.3
output_width: 1920
output_height: 1024
output_channel: 3
randomize_input_shape_period: 0
mosaic_prob: 0.5
mosaic_min_ratio:0.2
}
dataset_config {
data_sources: {
label_directory_path: "/workspace/tlt-experiments/data/training/label_2"
image_directory_path: "/workspace/tlt-experiments/data/training/image_2"
}
include_difficult_in_training: true
target_class_mapping {
key: "안전벨트 착용"
value: "Belt on"
}
target_class_mapping {
key: "안전벨트 미착용"
value: "Belt off"
}
target_class_mapping {
key: "안전화 착용"
value: "Shoes on"
}
target_class_mapping {
key: "안전모 착용"
value: "Helmet on"
}
validation_data_sources: {
label_directory_path: "/workspace/tlt-experiments/data/val/label"
image_directory_path: "/workspace/tlt-experiments/data/val/image"
}
}
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
training results :
Producing predictions: 100%|████████████████| 1665/1665 [06:25<00:00, 4.32it/s]
Start to calculate AP for each class
*******************************
belt off AP 0.0
belt on AP 0.0
helmet on AP 0.0
shoes on AP 0.0
mAP 0.0
*******************************
Validation loss: 0.00021867455646127195
==========
During training, I watched the loss close to 0, but the inference result always yields 0 mAP.
Is there anything I need to edit in the spec file?
Also, is it correct to set the nlayers of the spec file to 18 for resnet18 and 50 for resnet50 depending on the backbone?
We trained on 4 classes, and all classes have a similar number of instances.