Help with Detectnet_V2 train config file (TAO)

R2P2 · December 2, 2022, 7:49pm

Overview:
I have been using TAO to train custom, single-class detectnet_v2 networks with a resnet18 backbone on 1080p RGB images. This is the object/target that I am training on:

While the networks are not perfect, I have great success deploying them for our use case. However, there are a few issues/cases I am running into that I would like to fix.

Behavior:
When the object/target is far away/small, the network renders a near-perfect bounding box encapsulating the target:

However, as the target gets closer, the neural network loses detection completely or begins to “split” the target:

Current Improvement:
Over the last couple days, I have been trying to learn about all the different parameters in the training config for Detectnet_V2 with some success. My training config file now looks like this:

random_seed: 42
dataset_config {
  data_sources {
    tfrecords_path: "/workspace/tlt-experiments/data/tfrecords_target/kitti_train/*"
    image_directory_path: "/workspace/tlt-experiments/data/Set_target/training"
  }
  image_extension: "png"
  target_class_mapping {
    key: "target"
    value: "target"
  }
    
  validation_data_source: {
    tfrecords_path: "/workspace/tlt-experiments/data/tfrecords_target/kitti_val/*"
    image_directory_path: "/workspace/tlt-experiments/data/Set_target/val"
  }
}
augmentation_config {
  preprocessing {
    output_image_width: 1920 
    output_image_height: 1088 
    min_bbox_width: 8.0
    min_bbox_height: 8.0
    output_image_channel: 3
  }
  spatial_augmentation {
    hflip_probability: 0.5
    vflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 32.0
    translate_max_y: 32.0
    rotate_rad_max: 0.69
  }
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.25
    contrast_scale_max: 0.1
    contrast_center: 0.5
  }
}
postprocessing_config {
  target_class_config {
    key: "target"
    value {
      clustering_config {
        clustering_algorithm: DBSCAN
        dbscan_confidence_threshold: 0.5
        coverage_threshold: 0.005
        dbscan_eps: 0.7
        dbscan_min_samples: 0.05
        minimum_bounding_box_height: 8
      }
    }
  }
}
model_config {
  pretrained_model_file: "/workspace/tlt-experiments/detectnet_v2/pretrained_resnet18/resnet18.hdf5"
  freeze_blocks: 0
  freeze_blocks: 1
  num_layers: 18
  use_pooling: False
  use_batch_norm: true
  dropout_rate: 0.5
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    }
    cov {
    }
  }
  arch: "resnet"
}
evaluation_config {
  validation_period_during_training: 5
  first_validation_epoch: 30
  minimum_detection_ground_truth_overlap {
    key: "target"
    value: 0.6
  }
  evaluation_box_config {
    key: "target"
    value {
      minimum_height: 8
      maximum_height: 1088
      minimum_width: 8
      maximum_width: 1920
    }
  }
  average_precision_mode: INTEGRATE
}
cost_function_config {
  target_classes {
    name: "target"
    class_weight: 1.0
    coverage_foreground_weight: 0.05
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  enable_autoweighting: true
  max_objective_weight: 0.9999
  min_objective_weight: 0.0001
}
training_config {
  batch_size_per_gpu: 4
  num_epochs: 40
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 2e-06
      max_learning_rate: 2e-05
      soft_start: 0.1
      annealing: 0.6
    }
  }
  regularizer {
    type: L1
    weight: 3e-9
  }
  optimizer {
    adam {
      epsilon: 1e-08
      beta1: 0.9
      beta2: 0.999
    }
  }
  cost_scaling {
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
  }
  checkpoint_interval: 25
}
bbox_rasterizer_config {
  target_class_config {
    key: "target"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }
  deadzone_radius: 0.2
}

Changes from previous config file to this one:

Epochs: 60 → 40
I was worried about the model overfitting on a dataset of only 60k images.
Freeze blocks: 0,1,2 → 0,1
dbscan_eps: 0.3 → 0.7
Since the network was seeming to splice the detection, I thought that it may have been due to detections not being clustered together properly, so I increased this per the description here (DetectNet_v2 — TAO Toolkit 3.22.05 documentation).
deadzone_radius: 0.6 → 0.2
Since the target is a circle and the bounding box should ideally circumscribe the target/circle, I calculated the deadzone_radius as (1- (circle_area_of_radius_r / square_area_of_width_2r)) = 0.2 to give the area inside the bounding box that is not the target.
cov_radius_x: 0.5 → 1.0
cov_radius_y: 0.5 → 1.0
Since the bounding box should ideally circumscribe the target, the coverage radius for x and y should be 1.0
vflip_probability: 0.0 → 0.5

If my reasoning for changing any of these parameters is wrong, please correct me. Additionally, I have also been trying to look into coverage_foreground_weight, but the explanation (Tlt spec file - cost function - #4 by Morganh) confuses me as to what coverage_foreground_weight is supposed to represent

The neural network trained on this config file (using the same dataset as before) was able to track the target when it was larger/closer and fixed some of the “splitting”. Here are some outputs:
(1)

(2)

(3)

Image (1) shows improvement in the “splitting” but still does not encompass the entire target.

Image (2) shows that the new/improved network is able to detect on a larger/closer target but still exhibits the same issues as (1), but worse. The splitting gets worse as it gets closer and closer/larger and larger

Image (3) is on a sub-class of the target it has also been trained to detect and redemonstrates what (1) shows but on a different target. The red bounding box is the output from the previous neural network and the green bounding box is the output from the current neural network

Questions and Help:
If you could provide any guidance or critique of the train config file or other parts of the training process to help remedy any of the following issues:

Detection splitting when too close
Wrong dimensions detection when too close
No detections when too close

Additional Info:
All example images from the network output have been cut from their original 1080p image for internal reasons. If so desired, I can provide the full image in a private context.

Our dataset is roughly 60k 1080p RGB images hand-labeled in the KITTI format with just the class name and bounding box fields being non-zero. While the dataset does not include a lot of close-up/large images of the target, I would still expect it to be able to. Here are some data on the distribution of target bounding boxes in the dataset:

Width: Mean=103.318 px, Min=14, Max=1006
Height: Mean=75.932 px, Min=6, Max=1076

Width/Height Distribution:
Bounding Box Width/Height Distribution Histogram
Bounding Box Positional Heatmap Distribution

Does the tendency for the bounding box to be in the middle of the image and/or be smaller (100-200 width) have an effect on training? If so, can this be resolved with the augmentation_config’s zoom_min/max and translate_max_x/y properties?

(I did discover just now when grabbing these statistics that there were some wrong bounding boxes (like <25 in a dataset of 60k) so I will be retraining this weekend just to be sure)

shao77622 · June 13, 2023, 6:35am

I got similar problems, for detecting closer object, any solutions?

system · December 26, 2024, 9:12pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Error detectnet_V2 train with TAO : dbscan_min_samples: 0.05' TAO Toolkit tao	4	388	November 7, 2023
Incorrect bounding box of detectnet_v2-darknet-53 in the inference phase TAO Toolkit	10	696	October 12, 2021
Detectnet_v2(resnet50) low accuracy on 2 class dataset TAO Toolkit	25	920	February 12, 2023
Mean average precision too low on dimension (640*480) with (detectnetv2+Resnet18)? TAO Toolkit	2	728	October 12, 2021
Detectnet_v2 pruning size is not reduced TAO Toolkit	2	12	April 29, 2025
0.0 average precision during a detectnet_v2 training TAO Toolkit	10	496	September 28, 2023
Detectnet_v2 tlt ( training to detect person) TAO Toolkit	12	704	October 12, 2021
Very low precision while Training detectnet_v2 model using custom data in TAO TAO Toolkit	13	1065	May 4, 2023
Relationship between training dataset size and inference data size TAO Toolkit	12	694	February 22, 2022
Used the pascalvoc dataset to train with detectnet_V2, but the accuracy is low TAO Toolkit	15	586	July 6, 2022

Help with Detectnet_V2 train config file (TAO)

Related topics