Performance of TAO 3.22.05 and TAO 4.0.1 is lower than TAO 3.21.08

Hello,

I am training the same model (yolov4) on the exact same dataset and machine in TAO 3.21.08 and get an mAP of 0.15. When I do the same on TAO 3.22.05 or TAO 4.0.1 the mAP doesn’t go beyond 0.09 (see attached screenshots).


This is the config I am using (I cut out the dataset_config part here):
random_seed: 42
yolov4_config {
big_anchor_shape: “[(87.07, 119.20), (119.47, 87.33), (124.67, 123.07)]”
mid_anchor_shape: “[(78.13, 78.13), (59.73, 105.20), (106.93, 60.80)]”
small_anchor_shape: “[(36.67, 35.87), (48.00, 66.27), (68.13, 48.53)]”
box_matching_iou: 0.25
matching_neutral_box_iou: 0.5
arch: “cspdarknet”
nlayers: 19
arch_conv_blocks: 2
loss_loc_weight: 1.0
loss_neg_obj_weights: 1.0
loss_class_weights: 1.0
label_smoothing: 0.0
big_grid_xy_extend: 0.05
mid_grid_xy_extend: 0.1
small_grid_xy_extend: 0.2
freeze_bn: false
force_relu: false
}
training_config {
batch_size_per_gpu: 16
num_epochs: 500
enable_qat: false
checkpoint_interval: 5
learning_rate {
soft_start_cosine_annealing_schedule {
min_learning_rate: 2e-10
max_learning_rate: 0.0001
soft_start: 0.1
}
}
regularizer {
type: L1
weight: 3e-05
}
optimizer {
adam {
epsilon: 1e-07
beta1: 0.9
beta2: 0.999
amsgrad: false
}
}
use_multiprocessing: true
visualizer {
enabled: false
num_images: 3
}
}
eval_config {
average_precision_mode: SAMPLE
batch_size: 16
matching_iou_threshold: 0.5
}
nms_config {
confidence_threshold: 0.001
clustering_iou_threshold: 0.5
top_k: 200
infer_nms_score_bits: 0
force_on_cpu: false
}
augmentation_config {
hue: 0.1
saturation: 1.5
exposure: 1.5
vertical_flip: 0.5
horizontal_flip: 0.5
jitter: 0.3
output_width: 512
output_height: 288
output_channel: 3
randomize_input_shape_period: 0
mosaic_prob: 0.5
mosaic_min_ratio: 0.2
}

Why do you think I am getting this training difference?

Thanks for your help

When you run these two experiments,

  1. In terms of 22.05 or 4.0.1, could you share the training spec file? Is it above?
  2. In terms of 21.08, could you share the training spec file as well?

More, suggest you to run corresponding notebook to train again with the same KITTI dataset.
BTW, NVIDIA TAO - NVIDIA Docs contains all the versions of tao doc.

exp1:
For 21.08, please use below notebook and the spec file in it.

wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/tao/cv_samples/versions/v1.2.0/zip -O cv_samples_v1.2.0.zip
unzip -u cv_samples_v1.2.0.zip  -d ./cv_samples_v1.2.0 && rm -rf cv_samples_v1.2.0.zip && cd ./cv_samples_v1.2.0

exp2:
For 4.0.1,

wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/tao/tao-getting-started/versions/4.0.1/zip -O getting_started_v4.0.1.zip
unzip -u getting_started_v4.0.1.zip  -d ./getting_started_v4.0.1 && rm -rf getting_started_v4.0.1.zip && cd ./getting_started_v4.0.1

Hello,

I will try the notebooks you suggested and get back to you.

The spec file for 3.22.05 or 4.0.1 is the one I shared above.

The spec file for 3.21.08 is here:

random_seed: 42
yolov4_config {
big_anchor_shape: “[(63.24, 286.30), (86.96, 211.07), (104.55, 358.39)]”
mid_anchor_shape: “[(43.07, 137.33), (41.56, 218.45), (60.92, 164.30)]”
small_anchor_shape: “[(32.73, 103.78), (25.97, 153.50), (51.14, 85.11)]”
box_matching_iou: 0.25
matching_neutral_box_iou: 0.5
arch: “cspdarknet”
nlayers: 19
arch_conv_blocks: 2
loss_loc_weight: 0.8
loss_neg_obj_weights: 100.0
loss_class_weights: 0.5
label_smoothing: 0.0
big_grid_xy_extend: 0.05
mid_grid_xy_extend: 0.1
small_grid_xy_extend: 0.2
freeze_bn: false
force_relu: false
}
training_config {
batch_size_per_gpu: 16
checkpoint_interval: 3
num_epochs: 150
enable_qat: false
learning_rate {
soft_start_cosine_annealing_schedule {
min_learning_rate: 1e-07
max_learning_rate: 0.0001
soft_start: 0.3
}
}
regularizer {
type: L1
weight: 3e-05
}
optimizer {
adam {
epsilon: 1e-07
beta1: 0.9
beta2: 0.999
amsgrad: false
}
}
use_multiprocessing: true
}
eval_config {
average_precision_mode: SAMPLE
batch_size: 16
matching_iou_threshold: 0.5
}
nms_config {
confidence_threshold: 0.001
clustering_iou_threshold: 0.5
top_k: 200
infer_nms_score_bits: 0
force_on_cpu: false
}
augmentation_config {
hue: 0.1
saturation: 1.5
exposure: 1.5
vertical_flip: 0.5
horizontal_flip: 0.5
jitter: 0.3
output_width: 960
output_height: 544
output_channel: 3
randomize_input_shape_period: 0
mosaic_prob: 0.5
mosaic_min_ratio: 0.2
}

Thanks!

OK, so one training will be run in 3.21.08. Another training will be run in 3.22.05 or 4.0.1. Both are training against KITTI dataset mentioned in the notebook. Appreciate your work. Thanks for your time.

Hi,
I find that the two spec files are using different anchor_shapes and output_width/output_height.

For apple-to-apple comparison while training against the same dataset, the output_width/output_height should be the same. For example, both are using 960x544. And also the anchor_shapes are expected to the same.

Sorry, I attached the wrong TAO 3.21.08 spec file. Here is the correct one:

random_seed: 42
yolov4_config {
big_anchor_shape: “[(87.07, 119.20), (119.47, 87.33), (124.67, 123.07)]”
mid_anchor_shape: “[(78.13, 78.13), (59.73, 105.20), (106.93, 60.80)]”
small_anchor_shape: “[(36.67, 35.87), (48.00, 66.27), (68.13, 48.53)]”
box_matching_iou: 0.25
matching_neutral_box_iou: 0.5
arch: “cspdarknet”
nlayers: 19
arch_conv_blocks: 2
loss_loc_weight: 0.8
loss_neg_obj_weights: 100.0
loss_class_weights: 0.5
label_smoothing: 0.0
big_grid_xy_extend: 0.05
mid_grid_xy_extend: 0.1
small_grid_xy_extend: 0.2
freeze_bn: false
force_relu: false
}
training_config {
batch_size_per_gpu: 16
num_epochs: 200
enable_qat: false
checkpoint_interval: 5
learning_rate {
soft_start_cosine_annealing_schedule {
min_learning_rate: 1e-07
max_learning_rate: 0.0001
soft_start: 0.3
}
}
regularizer {
type: L1
weight: 3e-05
}
optimizer {
adam {
epsilon: 1e-07
beta1: 0.9
beta2: 0.999
amsgrad: false
}
}
use_multiprocessing: true
}
eval_config {
average_precision_mode: SAMPLE
batch_size: 16
matching_iou_threshold: 0.5
}
nms_config {
confidence_threshold: 0.001
clustering_iou_threshold: 0.5
top_k: 200
infer_nms_score_bits: 0
force_on_cpu: false
}
augmentation_config {
hue: 0.1
saturation: 1.5
exposure: 1.5
vertical_flip: 0.5
horizontal_flip: 0.5
jitter: 0.3
output_width: 512
output_height: 288
output_channel: 3
randomize_input_shape_period: 0
mosaic_prob: 0.5
mosaic_min_ratio: 0.2
}

As mentioned in another topic, you can try to run against the same KITTI dataset with these two version of tao container along with their corresponding spec file.

Hello,

I finished running all the experiments, please find all results in the attached pdf file.
Tao_experiments.pdf (44.4 KB)

I am not sure that the results show anything conclusive.

  1. For the standard Kitti dataset, the mAP goes up considerably between TAO3 and TAO4 even though everything else stayed the same. Is this in line with other experimental results you’ve seen? As you can see, the trend is exactly the opposite on our own data.
  2. There also seems to be a difference between training in Kitti vs TFRecords - is this expected? In the interest of time I only trained for 20 epochs, maybe if I trained for longer the difference would have evened out.

Please let me know what you think of the results - if there is anything that is different from what you’d expect.

Thanks!
Benedetta

Appreciate for your time and the detailed experimental result.
For TAO4, yes, we can see there is improvement against previous TAO3, which is expected. I can also see the similar in your own data since exp6 gets +17% improvement against exp5.
TAO 4.0.1 has improvement in the yolov4 network when compare to TAO3. So, suggest you to use the TAO4.0.1.
More, you should run the kmeans command (tao yolo_v4 kmeans ) to determine the best anchor shapes for your dataset and put those anchor shapes in the spec file.
From YOLOv4 - NVIDIA Docs
YOLOv4 supports two data formats: the sequence format (images folder and raw labels folder with KITTI format) and the tfrecords format (images folder and TFRecords). From our experience, if mosaic augmentation is disabled (mosaic_prob=0), training with TFRecords format is faster. If mosaic augmentation is enabled (mosaic_prob>0), training with sequence format is faster.
For both sequence format and TFRecords format, from the exp1 ~ exp4, TAO4 gets improvement against TAO3.
For your own dataset, according to your result, please run with tfrecords format.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.