Too many false positive in custom training (Detect-net v2 + Resnet 18)?

@Morganh I have a dateaset containing 22,00 training images and 700 testing images of a single class "person.
All my images are of dimension (1280*720).

I used the below resnet18_kitti_train_file for training.

random_seed: 42
dataset_config {
  data_sources {
    tfrecords_path: "/workspace/tlt-experiments/data/tfrecords/kitti_trainval/*"
    image_directory_path: "/workspace/tlt-experiments/data/training"
  image_extension: "jpg"
  target_class_mapping {
    key: "person"
    value: "person"
  validation_fold : 0

augmentation_config {
  preprocessing {
    output_image_width: 1280
    output_image_height: 720
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.2
    contrast_scale_max: 0.1
    contrast_center: 0.5

postprocessing_config {
  target_class_config {
    key: "person"
    value {
      clustering_config {
        clustering_algorithm: DBSCAN
        dbscan_confidence_threshold: 0.9
        coverage_threshold: 0.005
        dbscan_eps: 0.20
        dbscan_min_samples: 0.05
        minimum_bounding_box_height: 6
model_config {
  pretrained_model_file: "/workspace/tlt-experiments/detectnet_v2/pretrained_resnet18/tlt_pretrained_detectnet_v2_vresnet18/resnet18.hdf5"
  num_layers: 18
  use_batch_norm: True
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    cov {
  training_precision {
    backend_floatx: FLOAT32
  arch: "resnet"

evaluation_config {
  validation_period_during_training: 20
  first_validation_epoch: 10
  minimum_detection_ground_truth_overlap {
    key: "person"
    value: 0.4
  evaluation_box_config {
    key: "person"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
  average_precision_mode: INTEGRATE
cost_function_config {
  target_classes {
    name: "person"
    class_weight: 1.0
    coverage_foreground_weight: 0.05
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
  max_objective_weight: 0.999899983406
  min_objective_weight: 9.99999974738e-05
training_config {
  batch_size_per_gpu: 16
  num_epochs: 80
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 5e-06
      max_learning_rate: 5e-04
      soft_start: 0.10
      annealing: 0.7
  regularizer {
    type: L1
    weight: 3e-9
  optimizer {
    adam {
      epsilon: 9.99999993923e-09
      beta1: 0.9
      beta2: 0.999
  cost_scaling {
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
  checkpoint_interval: 5
bbox_rasterizer_config {
  target_class_config {
    key: "person"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0

In training, When I tested with 1st checkpoint (after 10 epochs) and with the 2nd checkpoint (after 20 epochs), the detection was good (Almost 70%), but it has given a lot of false positives as well. I am really confused, why this happened. Thanks.

When you said, “but it has given a lot of false positives as well”, what did you run to know the FP?

Thanks for reply @Morganh . When first checkpoint was saved, its file was in (TLT) format, i then converted that tlt file to etlt file, I then tested with deep-stream-test3-app. I trained detect-net on some videos, but when I tested on same videos, its shown many false positives.

When you said “When first checkpoint was saved”, is it the tlt file at 10th epoch?

More, is the “person” small? If yes, please refer to Frequently Asked Questions — Transfer Learning Toolkit 3.0 documentation, in order to improve the mAP.

Following parameters can help you improve AP on smaller objects:

  • Increase num_layers of resnet
  • class_weight for small objects
  • Increase the coverage_radius_x and coverage_radius_y parameters of the bbox_rasterizer_config section for the small objects class
  • Decrease minimum_detection_ground_truth_overlap
  • Lower minimum_height to cover more small objects for evaluation.

Then, please try to run “tlt detectnet_v2 inference” with the last epoch’s tlt file. To check if it can meet the mAP.

Yes, after 10th epoch tlt file saved. Person is not small object.

The 10th epoch’s tlt file should not have a good mAP since you set 80 epoch for the training. Please use last tlt file for evaluation.

After full training, when I tested on videos via deep-stream app, My model is showing good result but no improvement in false positive(mostly when no object is present in the image). Below is some samples attached.


How about running tlt detectnet_v2 inference against the same images?

Same results (too many false positive).

Did you save the training log? If yes, please share with us.
I am afraid you need to improve the mAP further.

More, all the your training images are similar to above two pictures , right?

No, I have not saved training log file.
no, all dataset is huge and have different angles, these above images are little part of that.

The data distribution is similar, right?

Data distribution is similar mean?

I mean that among your training images, the persons are inside the elevator, which is similar to your attached images above ,right?

Yes, in elevator data portion , all person are inside elevator, while in other data person are outside the elevator

How about the final mAP result after training is done?

Mean Average precision value is 77%

Suggest you to trigger more experiments to improve the mAP.
For example,

  • Set a smaller batch-size
  • Try resnet34 or larger backbone
  • More epochs

Before training, make sure the labels are correct.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.