Too many false positive in custom training (Detect-net v2 + Resnet 18)?

@Morganh I have a dateaset containing 22,00 training images and 700 testing images of a single class "person.
All my images are of dimension (1280*720).

I used the below resnet18_kitti_train_file for training.

random_seed: 42
dataset_config {
  data_sources {
    tfrecords_path: "/workspace/tlt-experiments/data/tfrecords/kitti_trainval/*"
    image_directory_path: "/workspace/tlt-experiments/data/training"
  }
  image_extension: "jpg"
  target_class_mapping {
    key: "person"
    value: "person"
  }
  validation_fold : 0
}

augmentation_config {
  preprocessing {
    output_image_width: 1280
    output_image_height: 720
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  }
  spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.2
    contrast_scale_max: 0.1
    contrast_center: 0.5
  }
}


postprocessing_config {
  target_class_config {
    key: "person"
    value {
      clustering_config {
        clustering_algorithm: DBSCAN
        dbscan_confidence_threshold: 0.9
        coverage_threshold: 0.005
        dbscan_eps: 0.20
        dbscan_min_samples: 0.05
        minimum_bounding_box_height: 6
      }
    }
  }
  }
  
  
model_config {
  pretrained_model_file: "/workspace/tlt-experiments/detectnet_v2/pretrained_resnet18/tlt_pretrained_detectnet_v2_vresnet18/resnet18.hdf5"
  num_layers: 18
  use_batch_norm: True
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    }
    cov {
    }
  }
  training_precision {
    backend_floatx: FLOAT32
  }
  arch: "resnet"
}

evaluation_config {
  validation_period_during_training: 20
  first_validation_epoch: 10
  minimum_detection_ground_truth_overlap {
    key: "person"
    value: 0.4
  }
  evaluation_box_config {
    key: "person"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }
  average_precision_mode: INTEGRATE
}
cost_function_config {
  target_classes {
    name: "person"
    class_weight: 1.0
    coverage_foreground_weight: 0.05
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  max_objective_weight: 0.999899983406
  min_objective_weight: 9.99999974738e-05
}
training_config {
  batch_size_per_gpu: 16
  num_epochs: 80
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 5e-06
      max_learning_rate: 5e-04
      soft_start: 0.10
      annealing: 0.7
    }
  }
  regularizer {
    type: L1
    weight: 3e-9
  }
  optimizer {
    adam {
      epsilon: 9.99999993923e-09
      beta1: 0.9
      beta2: 0.999
    }
  }
  cost_scaling {
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
  }
  checkpoint_interval: 5
}
bbox_rasterizer_config {
  target_class_config {
    key: "person"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }
  deadzone_radius:0.4
}

In training, When I tested with 1st checkpoint (after 10 epochs) and with the 2nd checkpoint (after 20 epochs), the detection was good (Almost 70%), but it has given a lot of false positives as well. I am really confused, why this happened. Thanks.

When you said, “but it has given a lot of false positives as well”, what did you run to know the FP?

Thanks for reply @Morganh . When first checkpoint was saved, its file was in (TLT) format, i then converted that tlt file to etlt file, I then tested with deep-stream-test3-app. I trained detect-net on some videos, but when I tested on same videos, its shown many false positives.

When you said “When first checkpoint was saved”, is it the tlt file at 10th epoch?

More, is the “person” small? If yes, please refer to Frequently Asked Questions — Transfer Learning Toolkit 3.0 documentation, in order to improve the mAP.

Following parameters can help you improve AP on smaller objects:

  • Increase num_layers of resnet
  • class_weight for small objects
  • Increase the coverage_radius_x and coverage_radius_y parameters of the bbox_rasterizer_config section for the small objects class
  • Decrease minimum_detection_ground_truth_overlap
  • Lower minimum_height to cover more small objects for evaluation.

Then, please try to run “tlt detectnet_v2 inference” with the last epoch’s tlt file. To check if it can meet the mAP.

Yes, after 10th epoch tlt file saved. Person is not small object.

The 10th epoch’s tlt file should not have a good mAP since you set 80 epoch for the training. Please use last tlt file for evaluation.