Too many false positive in custom training (Detect-net v2 + Resnet 18)?

muhammadrizwanmunawar · June 9, 2021, 4:45pm

@Morganh I have a dateaset containing 22,00 training images and 700 testing images of a single class "person.
All my images are of dimension (1280*720).

I used the below resnet18_kitti_train_file for training.

random_seed: 42
dataset_config {
  data_sources {
    tfrecords_path: "/workspace/tlt-experiments/data/tfrecords/kitti_trainval/*"
    image_directory_path: "/workspace/tlt-experiments/data/training"
  }
  image_extension: "jpg"
  target_class_mapping {
    key: "person"
    value: "person"
  }
  validation_fold : 0
}

augmentation_config {
  preprocessing {
    output_image_width: 1280
    output_image_height: 720
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  }
  spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.2
    contrast_scale_max: 0.1
    contrast_center: 0.5
  }
}


postprocessing_config {
  target_class_config {
    key: "person"
    value {
      clustering_config {
        clustering_algorithm: DBSCAN
        dbscan_confidence_threshold: 0.9
        coverage_threshold: 0.005
        dbscan_eps: 0.20
        dbscan_min_samples: 0.05
        minimum_bounding_box_height: 6
      }
    }
  }
  }
  
  
model_config {
  pretrained_model_file: "/workspace/tlt-experiments/detectnet_v2/pretrained_resnet18/tlt_pretrained_detectnet_v2_vresnet18/resnet18.hdf5"
  num_layers: 18
  use_batch_norm: True
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    }
    cov {
    }
  }
  training_precision {
    backend_floatx: FLOAT32
  }
  arch: "resnet"
}

evaluation_config {
  validation_period_during_training: 20
  first_validation_epoch: 10
  minimum_detection_ground_truth_overlap {
    key: "person"
    value: 0.4
  }
  evaluation_box_config {
    key: "person"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }
  average_precision_mode: INTEGRATE
}
cost_function_config {
  target_classes {
    name: "person"
    class_weight: 1.0
    coverage_foreground_weight: 0.05
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  max_objective_weight: 0.999899983406
  min_objective_weight: 9.99999974738e-05
}
training_config {
  batch_size_per_gpu: 16
  num_epochs: 80
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 5e-06
      max_learning_rate: 5e-04
      soft_start: 0.10
      annealing: 0.7
    }
  }
  regularizer {
    type: L1
    weight: 3e-9
  }
  optimizer {
    adam {
      epsilon: 9.99999993923e-09
      beta1: 0.9
      beta2: 0.999
    }
  }
  cost_scaling {
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
  }
  checkpoint_interval: 5
}
bbox_rasterizer_config {
  target_class_config {
    key: "person"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }
  deadzone_radius:0.4
}

In training, When I tested with 1st checkpoint (after 10 epochs) and with the 2nd checkpoint (after 20 epochs), the detection was good (Almost 70%), but it has given a lot of false positives as well. I am really confused, why this happened. Thanks.

Morganh · June 11, 2021, 12:34am

When you said, “but it has given a lot of false positives as well”, what did you run to know the FP?

muhammadrizwanmunawar · June 11, 2021, 7:25am

Thanks for reply @Morganh . When first checkpoint was saved, its file was in (TLT) format, i then converted that tlt file to etlt file, I then tested with deep-stream-test3-app. I trained detect-net on some videos, but when I tested on same videos, its shown many false positives.

Morganh · June 11, 2021, 7:38am

When you said “When first checkpoint was saved”, is it the tlt file at 10th epoch?

More, is the “person” small? If yes, please refer to Frequently Asked Questions — Transfer Learning Toolkit 3.0 documentation, in order to improve the mAP.

Following parameters can help you improve AP on smaller objects:

Increase num_layers of resnet

class_weight for small objects

Increase the coverage_radius_x and coverage_radius_y parameters of the bbox_rasterizer_config section for the small objects class

Decrease minimum_detection_ground_truth_overlap

Lower minimum_height to cover more small objects for evaluation.

Then, please try to run “tlt detectnet_v2 inference” with the last epoch’s tlt file. To check if it can meet the mAP.

muhammadrizwanmunawar · June 11, 2021, 7:43am

Yes, after 10th epoch tlt file saved. Person is not small object.

Morganh · June 11, 2021, 7:48am

The 10th epoch’s tlt file should not have a good mAP since you set 80 epoch for the training. Please use last tlt file for evaluation.

muhammadrizwanmunawar · June 21, 2021, 7:28am

After full training, when I tested on videos via deep-stream app, My model is showing good result but no improvement in false positive(mostly when no object is present in the image). Below is some samples attached.

Thanks.

Morganh · June 21, 2021, 7:44am

How about running tlt detectnet_v2 inference against the same images?

muhammadrizwanmunawar · June 21, 2021, 7:45am

Same results (too many false positive).

Morganh · June 21, 2021, 7:58am

Did you save the training log? If yes, please share with us.
I am afraid you need to improve the mAP further.

More, all the your training images are similar to above two pictures , right?

muhammadrizwanmunawar · June 21, 2021, 8:00am

No, I have not saved training log file.
no, all dataset is huge and have different angles, these above images are little part of that.

Morganh · June 21, 2021, 8:02am

The data distribution is similar, right?

muhammadrizwanmunawar · June 21, 2021, 8:03am

Data distribution is similar mean?

Morganh · June 21, 2021, 8:05am

I mean that among your training images, the persons are inside the elevator, which is similar to your attached images above ,right?

muhammadrizwanmunawar · June 21, 2021, 8:06am

Yes, in elevator data portion , all person are inside elevator, while in other data person are outside the elevator

Morganh · June 21, 2021, 8:11am

How about the final mAP result after training is done?

muhammadrizwanmunawar · June 21, 2021, 8:13am

Mean Average precision value is 77%

Morganh · June 21, 2021, 8:16am

Suggest you to trigger more experiments to improve the mAP.
For example,

Set a smaller batch-size
Try resnet34 or larger backbone
More epochs

Before training, make sure the labels are correct.

system · August 20, 2021, 8:17am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.