Using detectnet_v2 pretrained models in TLT v3.0

I have started training two detectnet_v2 models, one with a pre-trained model and the other without, but they appear to be performing identically as in this old post. Besides defining the pretrained model location and which layers to fix, the configuration files are identical. I’m also training on the same dataset. Below are details on the hardware I’m using, the TLT version (3.0; upgrading to TAO is not an option right now), and the configuration files I’m using for the two experiments.

Hardware: 2 identical machines with 8 A100s, 256 cores, 1TB of RAM
Network type: Detectnet_v2
TLT Version:

Configuration of the TLT Instance

dockers: 		
	nvcr.io/nvidia/tlt-streamanalytics: 			
		docker_tag: v3.0-dp-py3
		tasks: 
			1. augment
			2. classification
			3. detectnet_v2
			4. dssd
			5. emotionnet
			6. faster_rcnn
			7. fpenet
			8. gazenet
			9. gesturenet
			10. heartratenet
			11. lprnet
			12. mask_rcnn
			13. retinanet
			14. ssd
			15. unet
			16. yolo_v3
			17. yolo_v4
			18. tlt-converter
	nvcr.io/nvidia/tlt-pytorch: 			
		docker_tag: v3.0-dp-py3
		tasks: 
			1. speech_to_text
			2. text_classification
			3. question_answering
			4. token_classification
			5. intent_slot_classification
			6. punctuation_and_capitalization
format_version: 1.0
tlt_version: 3.0
published_date: 02/02/2021

Below are the two training specification files. Some details have been modified for privacy.

Training spec file 1:

# Model config
model_config {
  arch: "resnet"
  pretrained_model_file: ""
  all_projections: True
  num_layers: 18
  use_pooling: False
  use_batch_norm: True
  dropout_rate: 0
  training_precision: {
    backend_floatx: FLOAT32
  }
  objective_set: {
    cov {}
    bbox {
      scale: 35.0
      offset: 0.5
    }
  }
}

# Bbox rasterizer
bbox_rasterizer_config {
  target_class_config {
    key: "weed"
    value: {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 0.4
      cov_radius_y: 0.4
      bbox_min_radius: 1.0
    }
  }
  target_class_config {
    key: "carrot"
    value: {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 0.4
      cov_radius_y: 0.4
      bbox_min_radius: 1.0
    }
  }
  deadzone_radius: 0.67
}

postprocessing_config {
  target_class_config {
    key: "weed"
    value: {
      clustering_config {
        coverage_threshold: 0.005
        dbscan_eps: 0.20    
        dbscan_min_samples: 0.05
        minimum_bounding_box_height: 20
      }
    }
  }
  target_class_config {
    key: "carrot"
    value: {
      clustering_config {
        coverage_threshold: 0.005
        dbscan_eps: 0.20
        dbscan_min_samples: 0.05
        minimum_bounding_box_height: 20
      }
    }
  }
}

cost_function_config {
  target_classes {
    name: "weed"
    class_weight: 1.0
    coverage_foreground_weight: 0.05
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 1.0
      weight_target: 1.0
    }
  }
  target_classes {
    name: "carrot"
    class_weight: 1.0
    coverage_foreground_weight: 0.05
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 1.0
      weight_target: 1.0
    }
  }
  enable_autoweighting: True
  max_objective_weight: 0.9999
  min_objective_weight: 0.0001
}

training_config {
  batch_size_per_gpu: 96 
  num_epochs: 10000
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 5e-9
      max_learning_rate: 5e-4
      soft_start: 0.1
      annealing: 0.7
    }
  }
  regularizer {
    type: L1
    weight: 3e-9
  }
  optimizer {
    adam {
      epsilon: 1e-08
      beta1: 0.9
      beta2: 0.999
    }
  }
  cost_scaling {
    enabled: False
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
  }
}

augmentation_config {
  preprocessing {
    output_image_width: 768
    output_image_height: 768
    output_image_channel: 3
    min_bbox_width: 5.0
    min_bbox_height: 5.0
  }
  spatial_augmentation {
    hflip_probability: 0.5
    vflip_probability: 0.5
    zoom_min: 0.9
    zoom_max: 1.0
    translate_max_x: 100.0
    translate_max_y: 100.0
    rotate_rad_max: 0.69
  }
  color_augmentation {
    color_shift_stddev: 0.0
    hue_rotation_max: 25.0
    saturation_shift_max: 0.2
    contrast_scale_max: 0.1
    contrast_center: 0.5
  }
}

evaluation_config {
  average_precision_mode: SAMPLE
  validation_period_during_training: 10
  first_validation_epoch: 30
  minimum_detection_ground_truth_overlap {
    key: "weed"
    value: 0.3
  }
  minimum_detection_ground_truth_overlap {
    key: "carrot"
    value: 0.3
  }
  evaluation_box_config {
    key: "weed"
    value {
      minimum_height: 4
      maximum_height: 9999
      minimum_width: 4
      maximum_width: 9999
    }
  }
  evaluation_box_config {
    key: "carrot"
    value {
      minimum_height: 4
      maximum_height: 9999
      minimum_width: 4
      maximum_width: 9999
    }
  }
}

dataset_config {
  data_sources: {
    tfrecords_path: "/data/tfrecords/train-*"
    image_directory_path: "/data"
  }
  image_extension: "jpg"
  target_class_mapping {
      key: "weed"
      value: "weed"
  }
  target_class_mapping {
      key: "carrot"
      value: "carrot"
  }
  validation_data_source: {
    tfrecords_path: "/data/tfrecords/validation-*"
    image_directory_path: "/data"
  }
}

Training spec file 2:

# Model config
model_config {
  arch: "resnet"
  pretrained_model_file: "/models/detectnet_v2/2-class/model.tlt"
  freeze_blocks: 0
  freeze_blocks: 1
  freeze_blocks: 2
  all_projections: True
  num_layers: 18
  use_pooling: False
  use_batch_norm: True
  dropout_rate: 0
  training_precision: {
    backend_floatx: FLOAT32
  }
  objective_set: {
    cov {}
    bbox {
      scale: 35.0
      offset: 0.5
    }
  }
}

# Bbox rasterizer
bbox_rasterizer_config {
  target_class_config {
    key: "weed"
    value: {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 0.4
      cov_radius_y: 0.4
      bbox_min_radius: 1.0
    }
  }
  target_class_config {
    key: "carrot"
    value: {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 0.4
      cov_radius_y: 0.4
      bbox_min_radius: 1.0
    }
  }
  deadzone_radius: 0.67
}

postprocessing_config {
  target_class_config {
    key: "weed"
    value: {
      clustering_config {
        coverage_threshold: 0.005
        dbscan_eps: 0.20    
        dbscan_min_samples: 0.05
        minimum_bounding_box_height: 20
      }
    }
  }
  target_class_config {
    key: "carrot"
    value: {
      clustering_config {
        coverage_threshold: 0.005
        dbscan_eps: 0.20
        dbscan_min_samples: 0.05
        minimum_bounding_box_height: 20
      }
    }
  }
}

cost_function_config {
  target_classes {
    name: "weed"
    class_weight: 1.0
    coverage_foreground_weight: 0.05
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 1.0
      weight_target: 1.0
    }
  }
  target_classes {
    name: "carrot"
    class_weight: 1.0
    coverage_foreground_weight: 0.05
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 1.0
      weight_target: 1.0
    }
  }
  enable_autoweighting: True
  max_objective_weight: 0.9999
  min_objective_weight: 0.0001
}

training_config {
  batch_size_per_gpu: 96 
  num_epochs: 10000
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 5e-9
      max_learning_rate: 5e-4
      soft_start: 0.1
      annealing: 0.7
    }
  }
  regularizer {
    type: L1
    weight: 3e-9
  }
  optimizer {
    adam {
      epsilon: 1e-08
      beta1: 0.9
      beta2: 0.999
    }
  }
  cost_scaling {
    enabled: False
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
  }
}

augmentation_config {
  preprocessing {
    output_image_width: 768
    output_image_height: 768
    output_image_channel: 3
    min_bbox_width: 5.0
    min_bbox_height: 5.0
  }
  spatial_augmentation {
    hflip_probability: 0.5
    vflip_probability: 0.5
    zoom_min: 0.9
    zoom_max: 1.0
    translate_max_x: 100.0
    translate_max_y: 100.0
    rotate_rad_max: 0.69
  }
  color_augmentation {
    color_shift_stddev: 0.0
    hue_rotation_max: 25.0
    saturation_shift_max: 0.2
    contrast_scale_max: 0.1
    contrast_center: 0.5
  }
}

evaluation_config {
  average_precision_mode: SAMPLE
  validation_period_during_training: 10
  first_validation_epoch: 30
  minimum_detection_ground_truth_overlap {
    key: "weed"
    value: 0.3
  }
  minimum_detection_ground_truth_overlap {
    key: "carrot"
    value: 0.3
  }
  evaluation_box_config {
    key: "weed"
    value {
      minimum_height: 4
      maximum_height: 9999
      minimum_width: 4
      maximum_width: 9999
    }
  }
  evaluation_box_config {
    key: "carrot"
    value {
      minimum_height: 4
      maximum_height: 9999
      minimum_width: 4
      maximum_width: 9999
    }
  }
}

dataset_config {
  data_sources: {
    tfrecords_path: "/data/tfrecords/train-*"
    image_directory_path: "/data"
  }
  image_extension: "jpg"
  target_class_mapping {
      key: "weed"
      value: "weed"
  }
  target_class_mapping {
      key: "carrot"
      value: "carrot"
  }
  validation_data_source: {
    tfrecords_path: "/data/tfrecords/validation-*"
    image_directory_path: "/data"
  }
}

For detectnet_v2, below are the finding.

  • Achieve the same accuracy with less data while using pretrained models
  • Thus, lower training cost while using pretrained models

Your response doesn’t appear to address the problem. Why would two models, one started with a pretrained model and the other not, perform identically?

Per internal experiments, it is not identical. We run experiments with peoplenet as the pretrained model. And train on public IR dataset. At the beginning phase, the mAP result with a pretrained model is higher than the mAP result without a pretrained model.

I understand it should not be the case but it is what I am experiencing given the configuration outlined in my original post. The question is, why would it happen?

Can you remove above and retry?

BTW, is “/models/detectnet_v2/2-class/model.tlt” trained by your training images?

Removing those lines results in the same behavior.

Yes, /models/detectnet_v2/2-class/model.tlt is trained with our own images.

Did you draw the mAP curves for your two experiments ? If yes, please share with us.

No. The mAP in both cases are nearly identical at each epoch.

Can you share the logs?

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

More, how many images in your training dataset and validation dataset?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.