Customer training with own dataset is not incremental

Not sure if I can explain my question clearly but I will give a try -

I tried to train trafficcamnet based on my own dataset which is the traffic in a specific env, e.g tunnel with higher cam position. The expectation is that this will give the model more ability to detect object under such env, in the mean time, it still detects traffic under norml road enviroment as shown in the sample video files.

The training indeeded improved the detection under such env but it completedly corrupted the detection under normal road env.

I thought tlt works in an incremental way, seems it does not. If it does not, does it mean every time when I do the training I should give a complete dataset with various env?

Thanks,
Kai

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

See NVIDIA NGC ,

TrafficCamNet v1.0 model was trained on a proprietary dataset with more than 3 million objects for car class. Most of the training dataset was collected and labeled in-house from several traffic cameras in a city in the US. The dataset also contains 40,000 images from a variety of dashcam to help with generalization and discrimination across classes. This content was chosen to improve accuracy of the object detection for images from a traffic cam at a traffic intersection.

TLT can help end user transfer learning their own datasets.

For your case, detecting cars in the tunnel with higher cam position, its data distribution is different from trafficcamnet’s. It is recommended to re-train the unpruned model with TLT with your own dataset. You can train part of the complete dataset, try finetuning parameters, then even increasing training data if mAP is not expected.

Yes, we did as you advised. The mAp is 90% which is quite good, however, when I tried to use this model to conduct interference, I found the detection only works on tunnel env but completely screwed with original env.

May I know more about “completely screwed with original env”?Are there more FPs and FNs?

Please look at the pictures…

The data distribution in tunnel is quite different from dataset in training trafficcamnet. It brings regression in detecting ori env. So it is necessary to add some traffic data if targets for ori env.

Ok, I will give it a go!

I dont see huge differences between these 2 road enviroments so I used the ~200 pictures of the right env as the customer dataset. The result is still showing huge degradation for ori env.

This two environments are still not similar. Anyway, may I know which environment will you run inference?
The more training dataset closed to the actual environment for inference, the better result there will be. You can ignore the ori env if your inference environment is not similar to it.

I used the detectnet_v2.ipynb to do the training and evaluation. I wanted to leverage the pre-trained model, I thought this is one of the powerful benefits I get from TLT. Based on your explanation, in this case, there is no difference from w/o a TLT/a fresh start training - collecting own dataset, build the model then do the training from a fresh start.

Is my understanding correct?

No, the pre-trained model can play an important role in transfer learning. It has more advantage than training from random weights. Internally we ever triggered similar experiments with peoplenet pretrained models. Use it to train a public FLIR Dataset FREE - FLIR Thermal Dataset for Algorithm Training | Teledyne FLIR and find that it can achieve the same mAP with 2.5x less data. And also it proves that with peeplenet pretrained model, it can get the expected mAP with less epoch, which means lower training cost.

For your case, could you please share your training spec firstly?

Thanks for your reply, this is good if this is the case. I pasted the spec below -

random_seed: 42
dataset_config {
data_sources {
tfrecords_path: “/workspace/tlt-experiments/data/tfrecords/kitti_trainval/*”
image_directory_path: “/workspace/tlt-experiments/data/training”
}
image_extension: “jpg”
target_class_mapping {
key: “car”
value: “car”
}
validation_fold: 0
}
augmentation_config {
preprocessing {
output_image_width: 1248
output_image_height: 384
min_bbox_width: 1.0
min_bbox_height: 1.0
output_image_channel: 3
enable_auto_resize: true
}
spatial_augmentation {
hflip_probability: 0.5
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}
postprocessing_config {
target_class_config {
key: “car”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00499999988824
dbscan_eps: 0.20000000298
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
target_class_config {
key: “cyclist”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00499999988824
dbscan_eps: 0.15000000596
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
target_class_config {
key: “pedestrian”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00749999983236
dbscan_eps: 0.230000004172
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
}
model_config {
pretrained_model_file: “/workspace/tlt-experiments/detectnet_v2/trafficcamnet/tlt_trafficcamnet_vunpruned_v1.0/resnet18_trafficcamnet.tlt”
num_layers: 18
use_batch_norm: true
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
arch: “resnet”
}
evaluation_config {
validation_period_during_training: 10
first_validation_epoch: 30
minimum_detection_ground_truth_overlap {
key: “car”
value: 0.699999988079
}
minimum_detection_ground_truth_overlap {
key: “cyclist”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “pedestrian”
value: 0.5
}
evaluation_box_config {
key: “car”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
evaluation_box_config {
key: “cyclist”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
evaluation_box_config {
key: “pedestrian”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
average_precision_mode: INTEGRATE
}
cost_function_config {
target_classes {
name: “car”
class_weight: 1.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
target_classes {
name: “cyclist”
class_weight: 8.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 1.0
}
}
target_classes {
name: “pedestrian”
class_weight: 4.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
enable_autoweighting: true
max_objective_weight: 0.999899983406
min_objective_weight: 9.99999974738e-05
}
training_config {
batch_size_per_gpu: 4
num_epochs: 120
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-06
max_learning_rate: 5e-04
soft_start: 0.10000000149
annealing: 0.699999988079
}
}
regularizer {
type: L1
weight: 3.00000002618e-09
}
optimizer {
adam {
epsilon: 9.99999993923e-09
beta1: 0.899999976158
beta2: 0.999000012875
}
}
cost_scaling {
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
checkpoint_interval: 10
}
bbox_rasterizer_config {
target_class_config {
key: “car”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.40000000596
cov_radius_y: 0.40000000596
bbox_min_radius: 1.0
}
}
target_class_config {
key: “cyclist”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
target_class_config {
key: “pedestrian”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
deadzone_radius: 0.400000154972
}

Several comments below.

  • Since you only train one class (car) , please remove all the contents about other two classes.
  • Please modify
output_image_width: 1248
output_image_height: 384

to

output_image_width: 960
output_image_height: 544
  • Set
    minimum_height: 4
    minimum_width: 4

  • Set
    minimum_bounding_box_height: 4