Is there anyway to get accuracy equal to Yolov4 in Detect-net-v2 (with resnet18) using transfer learning toolkit (TLT)?

muhammadrizwanmunawar · May 31, 2021, 12:33pm

I trained (detect-net-v2 with resnet-18) on single class containing 1,60,000 training images.

The Mean average precision (MAP) I got is almost 73%. while on same data in yolov4 I got 91% MAP (Mean average precision).

I Know that increase in layers is directly related to Mean average precision and accuracy, but I am working on deep-stream. I need (FPS+good detection), If I will increase layers so, my FPS will move down.

Morganh · May 31, 2021, 1:33pm

Is the person class small object?

muhammadrizwanmunawar · May 31, 2021, 1:34pm

No, images have large size, all images are resized (640,480)

Morganh · May 31, 2021, 1:35pm

I mean the size of each bbox for the person. Is it small?

muhammadrizwanmunawar · May 31, 2021, 1:39pm

bbox varies for every image. (My dataset includes coco data and open images of person class).

Morganh · May 31, 2021, 1:44pm

You can calculate the average resolution according to the coordinate of each label.

This is your training dataset, right? How many images from coco dataset, and how many images from open images?
And what is your test dataset?

muhammadrizwanmunawar · May 31, 2021, 1:46pm

I have 60,000 coco images, 20,000 open-images data, and more than 80,000 other images. while my testing data includes some coco and open_images.

Testing data : 40,000
Training data : 1,64,000

Morganh · May 31, 2021, 2:05pm

Can you share your training spec?

muhammadrizwanmunawar · May 31, 2021, 2:17pm

random_seed: 42
dataset_config {
  data_sources {
    tfrecords_path: "/workspace/tlt-experiments/data/tfrecords/kitti_trainval/*"
    image_directory_path: "/workspace/tlt-experiments/data/training"
  }
  image_extension: "jpg"
  target_class_mapping {
    key: "person"
    value: "person"
  }
  validation_fold: 0
}
augmentation_config {
  preprocessing {
    output_image_width: 640
    output_image_height: 480
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  }
  spatial_augmentation {
    hflip_probability: 0.5
    vflip_probability: 0.0
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
  color_shift_stddev: 0.0
    hue_rotation_max: 25.0
    saturation_shift_max: 0.2
    contrast_scale_max: 0.1
    contrast_center: 0.5
  }
}
postprocessing_config {
  target_class_config {
    key: "person"
    value {
      clustering_config {
clustering_algorithm: DBSCAN
        dbscan_confidence_threshold: 0.9
        coverage_threshold: 0.005
        dbscan_eps: 0.15
        dbscan_min_samples: 0.05
        minimum_bounding_box_height: 20
      }
    }
  }
  }
model_config {
  pretrained_model_file: "/workspace/tlt-experiments/detectnet_v2/pretrained_resnet18/tlt_pretrained_detectnet_v2_vresnet18/resnet18.hdf5"
  num_layers: 18
  freeze_blocks: 0
  freeze_blocks: 1
  all_projections: True
  use_pooling: False
  use_batch_norm: True
  dropout_rate: 0.1

  use_batch_norm: true
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    }
    cov {
    }
  }
  training_precision {
    backend_floatx: FLOAT32
  }
  arch: "resnet"
}
evaluation_config {
  validation_period_during_training: 5
  first_validation_epoch: 1
  minimum_detection_ground_truth_overlap {
    key: "person"
    value: 0.55
  }
  evaluation_box_config {
    key: "person"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }
  average_precision_mode: INTEGRATE
}
cost_function_config {
  target_classes {
    name: "person"
    class_weight: 1.0
    coverage_foreground_weight: 0.05
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  enable_autoweighting: true
  max_objective_weight: 0.999899983406
  min_objective_weight: 9.99999974738e-05
}
training_config {
  batch_size_per_gpu: 16
  num_epochs: 160
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 5e-05
      max_learning_rate: 2e-04
      soft_start: 0.15
      annealing: 0.8
    }
  }
  regularizer {
    type: L1
    weight: 3e-9
  }
  optimizer {
    adam {
      epsilon: 9.99999993923e-09
      beta1: 0.9
      beta2: 0.999
    }
  }
  cost_scaling {
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
  }
  checkpoint_interval: 5
}
bbox_rasterizer_config {
  target_class_config {
    key: "person"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 0.5
      cov_radius_y: 0.5
      bbox_min_radius: 1.0
    }
  }
  deadzone_radius:0.67
}

Morganh · May 31, 2021, 3:42pm

Please try

Increase the coverage_radius_x and coverage_radius_y parameters of the bbox_rasterizer_config section for the small objects class
Decrease minimum_detection_ground_truth_overlap
Lower minimum_height to cover more small objects for evaluation.

Also, please trigger experiments on different batch_size_per_gpu.

Topic		Replies	Views
Mean average precision too low on dimension (640*480) with (detectnetv2+Resnet18)? TAO Toolkit	2	724	October 12, 2021
training on small objects TAO Toolkit	2	504	October 12, 2021
Detectnet_v2 tlt ( training to detect person) TAO Toolkit	12	703	October 12, 2021
TLT2.0 When using DetectNet/PeopleNet, do you need at least 2 classes..? TAO Toolkit	9	804	April 11, 2022
Detectnet_v2(resnet50) low accuracy on 2 class dataset TAO Toolkit	25	914	February 12, 2023
Too many false positive in custom training (Detect-net v2 + Resnet 18)? TAO Toolkit jetson-inference	18	830	August 20, 2021
Train tlt detectnet_v2 resnet18 bbox out of range? Deep Learning (Training & Inference)	1	411	July 4, 2020
Error detectnet_V2 train with TAO : dbscan_min_samples: 0.05' TAO Toolkit tao	4	388	November 7, 2023
TLT trained model accuracy worse after deployment TAO Toolkit	11	832	October 12, 2021
Very low precision while Training detectnet_v2 model using custom data in TAO TAO Toolkit	13	1037	May 4, 2023

Is there anyway to get accuracy equal to Yolov4 in Detect-net-v2 (with resnet18) using transfer learning toolkit (TLT)?

Related topics