Too many false positives.

I am training a 5 class ( person, car, bicycle, bus, bike) detector using detectnetv2_resnet10 pretrained model. The issue is that I am getting too many false positives across all the classes especially the person class. Any pointers on how to reduce this?

Could you please paste your training spec?

Hi, this is the spec file which I am using

random_seed: 42
dataset_config {
  data_sources {
    tfrecords_path: "/workspace/tlt-experiments/tfrecords/kitti_trainval/*"
    image_directory_path: "/workspace/tlt-experiments/data/training"
  }
  image_extension: "jpg"
  target_class_mapping {
    key: "car"
    value: "car"
  }
  target_class_mapping {
    key: "bus"
    value: "bus"
  }
  target_class_mapping {
    key: "bicycle"
    value: "bicycle"
  }
  target_class_mapping {
    key: "motorbike"
    value: "motorbike"
  }
  target_class_mapping {
    key: "person"
    value: "person"
  }
  validation_fold: 0
}
augmentation_config {
  preprocessing {
    output_image_width: 512
    output_image_height: 512
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  }
  spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    hue_rotation_max: 0.0
    saturation_shift_max: 0.0
    contrast_scale_max: 0.0
    contrast_center: 0.0
  }
}
postprocessing_config {
  target_class_config {
    key: "car"
    value {
      clustering_config {
        coverage_threshold: 0.00499999988824
        dbscan_eps: 0.20000000298
        dbscan_min_samples: 0.0500000007451
        minimum_bounding_box_height: 10
      }
    }
  }
  target_class_config {
    key: "bus"
    value {
      clustering_config {
        coverage_threshold: 0.00499999988824
        dbscan_eps: 0.20000000298
        dbscan_min_samples: 0.0500000007451
        minimum_bounding_box_height: 10
      }
    }
  }
  target_class_config {
    key: "bicycle"
    value {
      clustering_config {
        coverage_threshold: 0.00499999988824
        dbscan_eps: 0.15000000596
        dbscan_min_samples: 0.0500000007451
        minimum_bounding_box_height: 5
      }
    }
  }
  target_class_config {
    key: "motorbike"
    value {
      clustering_config {
        coverage_threshold: 0.00499999988824
        dbscan_eps: 0.15000000596
        dbscan_min_samples: 0.0500000007451
        minimum_bounding_box_height: 5
      }
    }
  }
  target_class_config {
    key: "person"
    value {
      clustering_config {
        coverage_threshold: 0.00749999983236
        dbscan_eps: 0.230000004172
        dbscan_min_samples: 0.0500000007451
        minimum_bounding_box_height: 5
      }
    }
  }
}
model_config {
  pretrained_model_file: "/workspace/tlt-experiments/pretrained_resnet10/tlt_resnet10_detectnet_v2_v1/resnet10_detector.tlt"
  num_layers: 10
  use_batch_norm: true
  activation {
    activation_type: "relu"
  }
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    }
    cov {
    }
  }
  training_precision {
    backend_floatx: FLOAT32
  }
  arch: "resnet"
}
evaluation_config {
  validation_period_during_training: 5
  first_validation_epoch: 5
  minimum_detection_ground_truth_overlap {
    key: "car"
    value: 0.699999988079
  }
  minimum_detection_ground_truth_overlap {
    key: "bus"
    value: 0.699999988079
  }
  minimum_detection_ground_truth_overlap {
    key: "bicycle"
    value: 0.5
  }
  minimum_detection_ground_truth_overlap {
    key: "motorbike"
    value: 0.5
  }
  minimum_detection_ground_truth_overlap {
    key: "person"
    value: 0.5
  }
  evaluation_box_config {
    key: "car"
    value {
      minimum_height: 10
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }
  evaluation_box_config {
    key: "bus"
    value {
      minimum_height: 10
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }
  evaluation_box_config {
    key: "bicycle"
    value {
      minimum_height: 5
      maximum_height: 9999
      minimum_width: 5
      maximum_width: 9999
    }
  }
  evaluation_box_config {
    key: "motorbike"
    value {
      minimum_height: 5
      maximum_height: 9999
      minimum_width: 5
      maximum_width: 9999
    }
  }
  evaluation_box_config {
    key: "person"
    value {
      minimum_height: 5
      maximum_height: 9999
      minimum_width: 5
      maximum_width: 9999
    }
  }
  average_precision_mode: INTEGRATE
}
cost_function_config {
  target_classes {
    name: "car"
    class_weight: 1.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  target_classes {
    name: "bus"
    class_weight: 1.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  target_classes {
    name: "bicycle"
    class_weight: 2.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 1.0
    }
  }
  target_classes {
    name: "motorbike"
    class_weight: 5.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 1.0
    }
  }
  target_classes {
    name: "person"
    class_weight: 1.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  enable_autoweighting: true
  max_objective_weight: 0.999899983406
  min_objective_weight: 9.99999974738e-05
}
training_config {
  batch_size_per_gpu: 4
  num_epochs: 35
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 5e-06
      max_learning_rate: 5e-04
      soft_start: 0.10000000149
      annealing: 0.699999988079
    }
  }
  regularizer {
    type: L1
    weight: 3.00000002618e-09
  }
  optimizer {
    adam {
      epsilon: 9.99999993923e-09
      beta1: 0.899999976158
      beta2: 0.999000012875
    }
  }
  cost_scaling {
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
  }
  checkpoint_interval: 5
}
bbox_rasterizer_config {
  target_class_config {
    key: "car"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 0.40000000596
      cov_radius_y: 0.40000000596
      bbox_min_radius: 1.0
    }
  }
  target_class_config {
    key: "bus"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 0.40000000596
      cov_radius_y: 0.40000000596
      bbox_min_radius: 1.0
    }
  }
  target_class_config {
    key: "bicycle"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }
  target_class_config {
    key: "motorbike"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }
  target_class_config {
    key: "person"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }
  deadzone_radius: 0.400000154972
}

For training I am using mscoco

Have you resized the images to 512x512 and changed bboxes vaule of corresponding labels?

Yes

I am using resnet10 model.

Also, false positives are specifically huge for person class. For other classes, the false positives are much less.

Can you check how many images for each class?

So, this is the number of instances per class which I get while creating tfrecords -

Wrote the following numbers of objects:
person: 385446
bicycle: 10484
motorbike: 12940
bus: 23822
car: 602408

Currently, I am just concerned with getting good detections on car and person, hence the other class instances are quite low.

This is the class weight parameters with which I have tried tlt training with no significant success.

car = 1.0, person = 1.0
car = 1.0, person = 2.0
car = 2.0, person = 1.0

Could you plesae paste your mAP and each class’ AP result during the training?

More suggestions to tune the parameters:

  1. Set all minimum_detection_ground_truth_overlap to 0.5
  2. Set below to the same
objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  1. Set minimum_bounding_box_height to 3
  2. Set bs=16, epoch=360
  3. Set class_weight, try below:
    car: bus: bicycle: motorbike: person = 1: 16: 40: 30: 1.6
  4. Since your dataset is too unblanced, please consider train 3 classes or 2 classes.
    2 classes: car and person
  5. You can try to make dataset more blanced.
    Reduce some images of “car” class.
  6. Change to use resnet18 backbone

Hi,
Thanks for the reply.

I am getting the following AP and mAP.

Mean average_precision (in %): 22.5203

class name average precision (in %)


bicycle 1.00966
bus 1.86996
car 78.0495
motorbike 9.60343
person 22.0692

  1. Will do that, but I fail to see how it will affect the training.
  2. For all the classes should I set this ?
  3. Currently I am using batch size as 4. From what I understand, the lesser the batch size, generally greater the accuracy. Changing to 16 might speed up the training but what about the accuracy ?
  4. Also, I have already trained and retrained this model in sets of 120, 40, 40 , 35 epochs ( keeping the previous trained tlt model as initial weights) for a total of 235 epochs.
  5. Currently I am primarily concerned with person and car detections and reducing the false positives, from what I have seen increasing the weights led to more false positive.
  6. I don’t wish to change to resnet18 backbone as I have to deploy the final model on jetson nano and my fps will drop.

Also, while training I get these messages before the training starts.

target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.

But the training seems to work fine.

Do these messages mean something?

Those messages are not harmful.

  1. Will do that, but I fail to see how it will affect the training.
    [Morgan] Set lower IoU threshold

  2. For all the classes should I set this ?
    [Morgan] Yes

  3. Currently I am using batch size as 4. From what I understand, the lesser the batch size, generally greater the accuracy. Changing to 16 might speed up the training but what about the accuracy ?
    [Morgan] How many GPUs are your using?

  4. Also, I have already trained and retrained this model in sets of 120, 40, 40 , 35 epochs ( keeping the previous trained tlt model as initial weights) for a total of 235 epochs.
    [Morgan] Please do not use the previous trained tlt model as initial weights, because you get a low mAP previously.
    From my experiences, inlarging the epochs can have a positive effect on mAP.

  5. Currently I am primarily concerned with person and car detections and reducing the false positives, from what I have seen increasing the weights led to more false positive.
    [Morgan]: As mentioned above, since your dataset is too unblanced, please consider train 3 classes or 2 classes.
    (car and person). Or try to make dataset more blanced. Reduce some images of “car” class.
    Also, tune the class_weight.

  6. I don’t wish to change to resnet18 backbone as I have to deploy the final model on jetson nano and my fps will drop.
    [Morgan] For fps, you can tune different pruning ratio to improve. For mAP, resnet50 and resnet18 will be better than resnet10.

  1. I am using single gpu for training.
  2. I still wish to go for 5 class training, although I will reduce the number of car instances.
  3. I had trained a resnet18 model for 140 epochs previously, with more or less similar dataset although in that case the number of car instances was comparable to that of person. I was still getting false positives on persons ( although much less ). The mAP was greater than this(as expected).
    Lastly, I will be integrating the model with deepstream and the default deepstream model is based on resnet10, what kind of performance(fps) drop should I expect if I go ahead with resnet18.

For mAP, you can try larger backbones to meet requirement. For fps, you can prune to different tlt models and then retrain, try more experiments and select one to meet the requirement.
For tweaking class_weight,related topic: https://devtalk.nvidia.com/default/topic/1069397/transfer-learning-toolkit/detectnet-v2-18-layers-for-character-recognition-35-classes-/post/5419070/#5419070

Hi Morganh,
Thanks for the reply.
Are there any other pointers which you can give ?

Please try more experiments as mentioned above. If possible, please make each class’ data more balanced.
And use parts of the tfrecords to train for the first time to speep up training/tuning speed.

Hi Morganh,
I made the required changes i.e. I reduced the number of cars and made it to 250k instances with person being 300k instances. I have kept the other class instances same for the time being.
I retrained resent10 model for 180 epochs continuously. I am still seeing false positives for person. The issue is the false positives are quite big in size and have high confidence. I also trained a resnet18 model. As expected, the false positives reduced ( but are still there ).

So,
Can you provide me with more suggestions ?

Also,
Even though the other classes’ map are low ( motorbike, bicycle) and their instances are also low, while inferencing I am seeing qood results. So my questions is, whether having low data for other classes’ affects the detections of person class. Also, I have adjusted the class cost weight according to the number of instances.