Is there anyway to get accuracy equal to Yolov4 in Detect-net-v2 (with resnet18) using transfer learning toolkit (TLT)?

I trained (detect-net-v2 with resnet-18) on single class containing 1,60,000 training images.

The Mean average precision (MAP) I got is almost 73%. while on same data in yolov4 I got 91% MAP (Mean average precision).

I Know that increase in layers is directly related to Mean average precision and accuracy, but I am working on deep-stream. I need (FPS+good detection), If I will increase layers so, my FPS will move down.

Is the person class small object?

No, images have large size, all images are resized (640,480)

I mean the size of each bbox for the person. Is it small?

bbox varies for every image. (My dataset includes coco data and open images of person class).

You can calculate the average resolution according to the coordinate of each label.

This is your training dataset, right? How many images from coco dataset, and how many images from open images?
And what is your test dataset?

I have 60,000 coco images, 20,000 open-images data, and more than 80,000 other images. while my testing data includes some coco and open_images.

Testing data : 40,000
Training data : 1,64,000

Can you share your training spec?

random_seed: 42
dataset_config {
  data_sources {
    tfrecords_path: "/workspace/tlt-experiments/data/tfrecords/kitti_trainval/*"
    image_directory_path: "/workspace/tlt-experiments/data/training"
  }
  image_extension: "jpg"
  target_class_mapping {
    key: "person"
    value: "person"
  }
  validation_fold: 0
}
augmentation_config {
  preprocessing {
    output_image_width: 640
    output_image_height: 480
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  }
  spatial_augmentation {
    hflip_probability: 0.5
    vflip_probability: 0.0
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
  color_shift_stddev: 0.0
    hue_rotation_max: 25.0
    saturation_shift_max: 0.2
    contrast_scale_max: 0.1
    contrast_center: 0.5
  }
}
postprocessing_config {
  target_class_config {
    key: "person"
    value {
      clustering_config {
clustering_algorithm: DBSCAN
        dbscan_confidence_threshold: 0.9
        coverage_threshold: 0.005
        dbscan_eps: 0.15
        dbscan_min_samples: 0.05
        minimum_bounding_box_height: 20
      }
    }
  }
  }
model_config {
  pretrained_model_file: "/workspace/tlt-experiments/detectnet_v2/pretrained_resnet18/tlt_pretrained_detectnet_v2_vresnet18/resnet18.hdf5"
  num_layers: 18
  freeze_blocks: 0
  freeze_blocks: 1
  all_projections: True
  use_pooling: False
  use_batch_norm: True
  dropout_rate: 0.1

  use_batch_norm: true
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    }
    cov {
    }
  }
  training_precision {
    backend_floatx: FLOAT32
  }
  arch: "resnet"
}
evaluation_config {
  validation_period_during_training: 5
  first_validation_epoch: 1
  minimum_detection_ground_truth_overlap {
    key: "person"
    value: 0.55
  }
  evaluation_box_config {
    key: "person"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }
  average_precision_mode: INTEGRATE
}
cost_function_config {
  target_classes {
    name: "person"
    class_weight: 1.0
    coverage_foreground_weight: 0.05
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  enable_autoweighting: true
  max_objective_weight: 0.999899983406
  min_objective_weight: 9.99999974738e-05
}
training_config {
  batch_size_per_gpu: 16
  num_epochs: 160
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 5e-05
      max_learning_rate: 2e-04
      soft_start: 0.15
      annealing: 0.8
    }
  }
  regularizer {
    type: L1
    weight: 3e-9
  }
  optimizer {
    adam {
      epsilon: 9.99999993923e-09
      beta1: 0.9
      beta2: 0.999
    }
  }
  cost_scaling {
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
  }
  checkpoint_interval: 5
}
bbox_rasterizer_config {
  target_class_config {
    key: "person"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 0.5
      cov_radius_y: 0.5
      bbox_min_radius: 1.0
    }
  }
  deadzone_radius:0.67
}

Please try

  • Increase the coverage_radius_x and coverage_radius_y parameters of the bbox_rasterizer_config section for the small objects class
  • Decrease minimum_detection_ground_truth_overlap
  • Lower minimum_height to cover more small objects for evaluation.

Also, please trigger experiments on different batch_size_per_gpu.

1 Like