Hello,
I have previously trained a model in this docker container: “nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3” I used the following command and spec_file:
tlt-train detectnet_v2 -e tlt_experiment_spec.txt -r <output_dir> -k <key_to_load_the_model>
dataset_config {
data_sources {
tfrecords_path: “/workspace/data/tf_records/*”
image_directory_path: “/workspace/linked_data/”
}
image_extension: “jpg”
target_class_mapping {
key: “motorcycle”
value: “motorcycle”
}
target_class_mapping {
key: “vehicle”
value: “vehicle”
}
validation_fold: 0
}
augmentation_config {
preprocessing {
output_image_width: 960
output_image_height: 544
min_bbox_width: 1.0
min_bbox_height: 25.0
output_image_channel: 3
}
spatial_augmentation {
zoom_min: 1.0
zoom_max: 1.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.2199999988079071
contrast_scale_max: 0.1599999964237213
contrast_center: 0.5
}
}
postprocessing_config {
target_class_config {
key: “motorcycle”
value {
clustering_config {
coverage_threshold: 0.20000000298023224
minimum_bounding_box_height: 20
dbscan_eps: 0.5
dbscan_min_samples: 1
}
}
}
target_class_config {
key: “vehicle”
value {
clustering_config {
coverage_threshold: 0.20000000298023224
minimum_bounding_box_height: 20
dbscan_eps: 0.5
dbscan_min_samples: 1
}
}
}
}
model_config {
pretrained_model_file: “path_to_resnet34_peoplenet.tlt”
num_layers: 34
use_batch_norm: true
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
training_precision {
backend_floatx: FLOAT32
}
freeze_blocks: 0.0
arch: “resnet”
all_projections: true
}
evaluation_config {
validation_period_during_training: 10
first_validation_epoch: 10
minimum_detection_ground_truth_overlap {
key: “motorcycle”
value: 0.800000011920929
}
minimum_detection_ground_truth_overlap {
key: “vehicle”
value: 0.800000011920929
}
evaluation_box_config {
key: “motorcycle”
value {
minimum_height: 4
maximum_height: 9999
minimum_width: 4
maximum_width: 9999
}
}
evaluation_box_config {
key: “vehicle”
value {
minimum_height: 4
maximum_height: 9999
minimum_width: 4
maximum_width: 9999
}
}
average_precision_mode: INTEGRATE
}
cost_function_config {
target_classes {
name: “vehicle”
class_weight: 1.0
coverage_foreground_weight: 0.05000000074505806
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
target_classes {
name: “motorcycle”
class_weight: 1.0
coverage_foreground_weight: 0.05000000074505806
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
enable_autoweighting: true
max_objective_weight: 0.9998999834060669
min_objective_weight: 9.999999747378752e-05
}
training_config {
batch_size_per_gpu: 16
num_epochs: 1100
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 4.999999873689376e-06
max_learning_rate: 0.0005000000237487257
soft_start: 0.20000000298023224
annealing: 0.800000011920929
}
}
regularizer {
weight: 3.000000026176508e-09
}
optimizer {
adam {
epsilon: 9.99999993922529e-09
beta1: 0.8999999761581421
beta2: 0.9990000128746033
}
}
cost_scaling {
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
checkpoint_interval: 5
}
bbox_rasterizer_config {
target_class_config {
key: “motorcycle”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.4000000059604645
cov_radius_y: 0.4000000059604645
bbox_min_radius: 1.0
}
}
target_class_config {
key: “vehicle”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.4000000059604645
cov_radius_y: 0.4000000059604645
bbox_min_radius: 1.0
}
}
deadzone_radius: 0.6700000166893005
}
This spec file and training command resulted in a great model with Vehicle AP ~90%. I was hoping to move to a newer version of TLT (TAO) so I took the data, spec file, basemodel, and tried to repeat the same experiment using the newest docker image.
The image i used for my second experiment is nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5. I regenerated the tf_records and then ran the experiment. I used the exact same spec file that is seen above, the command I used to start training can be seen below.
detectnet_v2 train -e tlt_experiment_spec.txt -r <output_dir> -k <key_to_load_the_model>
The results of this experiment were not nearly as good. Vehicle AP only got to ~15%. I tried to mess with some of the parameters like the learning rate and cost function. The best Vehicle AP I could get so far was 40%.
The tlt experiment was ran on a machine with a Quadro RTX 5000 GPU. I have ran the TAO experiments on that machine and another with a NVIDIA GeForce RTX 4080.
I am happy to continue to tune the model to raise the performance but I would expect to have the same results when switching to the newer framework. I have read lots of documentation to see if there is a configuration that I am missing or maybe there is another step that I have skipped for the new framework but I haven’t found anything.
My main goal is to try to replicate the results that I got using the old version of tlt. Any information or documentation references on why the results would be so different would be appreciated!