Mean average precision too low on dimension (640*480) with (detectnetv2+Resnet18)?

muhammadrizwanmunawar · May 26, 2021, 7:48am

I trained model on detect-net v2 with classifiers(Resnet10 and Resnet18). The Mean average Precision on Resnet10 is 37, while on Resnet18 it’s 44.05.

I have one class for training (person) and below datasets details are mentioned,
Training Images : 103,851
Testing Images : 24,414

According to detectnetv2 official documentation, I resize all dateset images to same size (640,480), also resize bounding boxes, but the results are not as I expected.

why this happened, because sometimes deep learning models need more data for best performance. but here situation opposite.

Here My Kitti-train-val file

kitti_config {
  root_directory_path: "/workspace/tlt-experiments/data/training"
  image_dir_name: "image_2"
  label_dir_name: "label_2"
  image_extension: ".jpg"
  partition_mode: "random"
  num_partitions: 2
  val_split: 20
  num_shards: 20
}
image_directory_path: "/workspace/tlt-experiments/data/training"

Below is my training spec file attached for Resnet10

random_seed: 42
dataset_config {
  data_sources {
    tfrecords_path: "/workspace/tlt-experiments/data/tfrecords/kitti_trainval/*"
    image_directory_path: "/workspace/tlt-experiments/data/training"
  }
  image_extension: "jpg"
  target_class_mapping {
    key: "person"
    value: "person"
  }
  validation_fold: 0
}
augmentation_config {
  preprocessing {
    output_image_width: 640
    output_image_height: 480
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  }
  spatial_augmentation {
    hflip_probability: 0.5
    vflip_probability: 0.0
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
  color_shift_stddev: 0.0
    hue_rotation_max: 25.0
    saturation_shift_max: 0.2
    contrast_scale_max: 0.1
    contrast_center: 0.5
  }
}
postprocessing_config {
  target_class_config {
    key: "person"
    value {
      clustering_config {
clustering_algorithm: DBSCAN
        dbscan_confidence_threshold: 0.9
        coverage_threshold: 0.005
        dbscan_eps: 0.15
        dbscan_min_samples: 0.05
        minimum_bounding_box_height: 20
      }
    }
  }
  }
model_config {
  pretrained_model_file: "/workspace/tlt-experiments/detectnet_v2/pretrained_resnet10/tlt_pretrained_detectnet_v2_vresnet10/resnet10.hdf5"
  num_layers: 10
  freeze_blocks: 0
  freeze_blocks: 1
  all_projections: True
  use_pooling: False
  use_batch_norm: True
  dropout_rate: 0.0

  use_batch_norm: true
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    }
    cov {
    }
  }
  training_precision {
    backend_floatx: FLOAT32
  }
  arch: "resnet"
}
evaluation_config {
  validation_period_during_training: 5
  first_validation_epoch: 1
  minimum_detection_ground_truth_overlap {
    key: "person"
    value: 0.5
  }
  evaluation_box_config {
    key: "person"
    value {
      minimum_height: 4
      maximum_height: 9999
      minimum_width: 4
      maximum_width: 9999
    }
  }
  average_precision_mode: INTEGRATE
}
cost_function_config {
  target_classes {
    name: "person"
    class_weight: 1.0
    coverage_foreground_weight: 0.05
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  enable_autoweighting: true
  max_objective_weight: 0.999899983406
  min_objective_weight: 9.99999974738e-05
}
training_config {
  batch_size_per_gpu: 16
  num_epochs: 120
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 5e-06
      max_learning_rate: 5e-04
      soft_start: 0.1
      annealing: 0.7
    }
  }
  regularizer {
    type: L1
    weight: 3e-9
  }
  optimizer {
    adam {
      epsilon: 9.99999993923e-09
      beta1: 0.9
      beta2: 0.999
    }
  }
  cost_scaling {
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
  }
  checkpoint_interval: 5
}
bbox_rasterizer_config {
  target_class_config {
    key: "person"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 0.4
      cov_radius_y: 0.4
      bbox_min_radius: 1.0
    }
  }
  deadzone_radius:0.67
}

Below is my training spec file attached for Resnet18

random_seed: 42
dataset_config {
  data_sources {
tfrecords_path: "/workspace/tlt-experiments/data/tfrecords/kitti_trainval/*"
image_directory_path: "/workspace/tlt-experiments/data/training"
  }
  image_extension: "jpg"
  target_class_mapping {
key: "person"
value: "person"
  }
  validation_fold: 0
}
augmentation_config {
  preprocessing {
output_image_width: 640
output_image_height: 480
min_bbox_width: 1.0
min_bbox_height: 1.0
output_image_channel: 3
  }
  spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
  }
  color_augmentation {
  color_shift_stddev: 0.0
hue_rotation_max: 25.0
saturation_shift_max: 0.2
contrast_scale_max: 0.1
contrast_center: 0.5
  }
}
postprocessing_config {
  target_class_config {
key: "person"
value {
  clustering_config {
clustering_algorithm: DBSCAN
    dbscan_confidence_threshold: 0.9
    coverage_threshold: 0.005
    dbscan_eps: 0.15
    dbscan_min_samples: 0.05
    minimum_bounding_box_height: 20
  }
}
  }
  }
model_config {
  pretrained_model_file: "/workspace/tlt-experiments/detectnet_v2/pretrained_resnet18/tlt_pretrained_detectnet_v2_vresnet18/resnet18.hdf5"
  num_layers: 18
  freeze_blocks: 0
  freeze_blocks: 1
  all_projections: True
  use_pooling: False
  use_batch_norm: True
  dropout_rate: 0.0

  use_batch_norm: true
  objective_set {
bbox {
  scale: 35.0
  offset: 0.5
}
cov {
}
  }
  training_precision {
backend_floatx: FLOAT32
  }
  arch: "resnet"
}
evaluation_config {
  validation_period_during_training: 5
  first_validation_epoch: 1
  minimum_detection_ground_truth_overlap {
key: "person"
value: 0.5
  }
  evaluation_box_config {
key: "person"
value {
  minimum_height: 4
  maximum_height: 9999
  minimum_width: 4
  maximum_width: 9999
}
  }
  average_precision_mode: INTEGRATE
}
cost_function_config {
  target_classes {
name: "person"
class_weight: 1.0
coverage_foreground_weight: 0.05
objectives {
  name: "cov"
  initial_weight: 1.0
  weight_target: 1.0
}
objectives {
  name: "bbox"
  initial_weight: 10.0
  weight_target: 10.0
}
  }
  enable_autoweighting: true
  max_objective_weight: 0.999899983406
  min_objective_weight: 9.99999974738e-05
}
training_config {
  batch_size_per_gpu: 16
  num_epochs: 120
  learning_rate {
soft_start_annealing_schedule {
  min_learning_rate: 5e-06
  max_learning_rate: 5e-04
  soft_start: 0.1
  annealing: 0.7
}
  }
  regularizer {
type: L1
weight: 3e-9
  }
  optimizer {
adam {
  epsilon: 9.99999993923e-09
  beta1: 0.9
  beta2: 0.999
}
  }
  cost_scaling {
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
  }
  checkpoint_interval: 5
}
bbox_rasterizer_config {
  target_class_config {
key: "person"
value {
  cov_center_x: 0.5
  cov_center_y: 0.5
  cov_radius_x: 0.4
  cov_radius_y: 0.4
  bbox_min_radius: 1.0
}
  }
  deadzone_radius:0.67
}

Below Results are on Resnet18.

Thanks.

Morganh · May 26, 2021, 2:48pm

What is the average resolution of your dataset? You can set the input size to the average resolution of dataset.

Indeed, in current TLT 3.0_dp version, for detectnet_v2 network, it needs resizing images/labels offline. But it is not a must to set to (640,480). See https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/open_model_architectures.html#detectnet-v2

Input size : C * W * H (where C = 1 or 3, W > =480, H >=272 and W, H are multiples of 16)

More, could you finetune the batch-size? You can trigger different experiments on it.

And, is the “person” a small object? If yes, see Frequently Asked Questions — Transfer Learning Toolkit 3.0 documentation

Following parameters can help you improve AP on smaller objects:

Increase num_layers of resnet

class_weight for small objects

Increase the coverage_radius_x and coverage_radius_y parameters of the bbox_rasterizer_config section for the small objects class

Decrease minimum_detection_ground_truth_overlap

Lower minimum_height to cover more small objects for evaluation.

Topic		Replies	Views
training on small objects TAO Toolkit	2	504	October 12, 2021
Detectnet_v2 tlt ( training to detect person) TAO Toolkit	12	703	October 12, 2021
0.0 average precision during a detectnet_v2 training TAO Toolkit	10	491	September 28, 2023
Too many false positive in custom training (Detect-net v2 + Resnet 18)? TAO Toolkit jetson-inference	18	830	August 20, 2021
Detectnet_v2(resnet50) low accuracy on 2 class dataset TAO Toolkit	25	914	February 12, 2023
Is there anyway to get accuracy equal to Yolov4 in Detect-net-v2 (with resnet18) using transfer learning toolkit (TLT)? TAO Toolkit	10	747	October 12, 2021
TLT2.0 When using DetectNet/PeopleNet, do you need at least 2 classes..? TAO Toolkit	9	804	April 11, 2022
Train tlt detectnet_v2 resnet18 bbox out of range? Deep Learning (Training & Inference)	1	411	July 4, 2020
TLT trained model accuracy worse after deployment TAO Toolkit	11	832	October 12, 2021
Training Custom Object detector with 6 classes TAO Toolkit	27	2189	October 12, 2021

Mean average precision too low on dimension (640*480) with (detectnetv2+Resnet18)?

Related topics