0.0 average precision during a detectnet_v2 training

nasserha · September 22, 2023, 12:05pm

I am getting a a 0.0 average precision during a detectnet_v2 training.

Command:

!tao model detectnet_v2 train -e $SPECS_DIR/detectnet_v2_train_resnet18_kitti-1Class.txt \
                        -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
                        -n resnet18_detector \
                        --gpus $NUM_GPUS

A Sample annotation file:

rumex 0.0 0 0 1830 1195 1996 1348 0 0 0 0 0 0 0

• Hardware: T4
• Network Type: Detectnet_v2
• TAO Version: 5.0.0
• Training spec file

random_seed: 42
dataset_config {
  data_sources {
    tfrecords_path: "/workspace/tao-experiments/data/tfrecords/kitti_trainval/*"
    image_directory_path: "/workspace/tao-experiments/data/training"
  }
  image_extension: "png"
  target_class_mapping {
    key: "rumex"
    value: "rumex"
  }  validation_fold: 0
}
augmentation_config {
  preprocessing {
    output_image_width: 2048
    output_image_height: 1376
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  }
  spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.20000000298
    contrast_scale_max: 0.10000000149
    contrast_center: 0.5
  }
}
postprocessing_config {
  target_class_config {
    key: "rumex"
    value {
      clustering_config {
        clustering_algorithm: DBSCAN
        dbscan_confidence_threshold: 0.9
        coverage_threshold: 0.00499999988824
        dbscan_eps: 0.20000000298
        dbscan_min_samples: 1
        minimum_bounding_box_height: 20
      }
    }
  }
}
model_config {
  pretrained_model_file: "/workspace/tao-experiments/detectnet_v2/pretrained_resnet18/pretrained_detectnet_v2_vresnet18/resnet18.hdf5"
  num_layers: 18
  use_batch_norm: true
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    }
    cov {
    }
  }
  arch: "resnet"
}
evaluation_config {
  validation_period_during_training: 10
  first_validation_epoch: 30
  minimum_detection_ground_truth_overlap {
    key: "rumex"
    value: 0.5
  }
  evaluation_box_config {
    key: "rumex"
    value {
      minimum_height: 20
      maximum_height: 1000
      minimum_width: 10
      maximum_width: 1000
    }
  }
  average_precision_mode: INTEGRATE
}
cost_function_config {
  target_classes {
    name: "rumex"
    class_weight: 1.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  enable_autoweighting: false
  max_objective_weight: 0.999899983406
  min_objective_weight: 9.99999974738e-05
}
training_config {
  batch_size_per_gpu: 4
  num_epochs: 1000
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 5e-07
      max_learning_rate: 5e-05
      soft_start: 0.10000000149
      annealing: 0.699999988079
    }
  }
  regularizer {
    type: L1
    weight: 3.00000002618e-09
  }
  optimizer {
    adam {
      epsilon: 9.99999993923e-09
      beta1: 0.899999976158
      beta2: 0.999000012875
    }
  }
  cost_scaling {
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
  }
  visualizer{
    enabled: true
    num_images: 3
    scalar_logging_frequency: 10
    infrequent_logging_frequency: 5
    target_class_config {
      key: "rumex"
      value: {
        coverage_threshold: 0.005
      }
    }
    clearml_config{
      project: "TAO DetectNet 1 Class"
      task: "detectnet_v2_resnet18_clearml"
      tags: "detectnet_v2"
      tags: "training"
      tags: "resnet18"
      tags: "unpruned"
    }
    wandb_config{
      project: "TAO Toolkit Wandb Demo"
      name: "detectnet_v2_resnet18_wandb"
      tags: "detectnet_v2"
      tags: "training"
      tags: "resnet18"
      tags: "unpruned"
    }
  }
  checkpoint_interval: 10
}
bbox_rasterizer_config {
  target_class_config {
    key: "rumex"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 0.40000000596
      cov_radius_y: 0.40000000596
      bbox_min_radius: 1.0
    }
  }
  deadzone_radius: 0.400000154972
}

Morganh · September 22, 2023, 2:25pm

Please try to set enable_auto_resize to true. More info can be found in DetectNet_v2 - NVIDIA Docs

nasserha · September 22, 2023, 5:50pm

I’ve modified the Proprocessing config as follows:

augmentation_config {
  preprocessing {
    output_image_width: 2048
    output_image_height: 1376
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
    enable_auto_resize: true
  }
  spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.20000000298
    contrast_scale_max: 0.10000000149
    contrast_center: 0.5
  }
}

And then, I am rerunning the same training cell.
The train halts with an Exit code 1.

I thought this might come from the fact that enable_auto_resize was False in the previous checkpoint. So, I deleted all the previous checkpoint so that it starts with new configuration from scratch. Now, the command runs, but the average precision is still 0%.

I also have two other questions:

Why would the enable_auto_resize parameter affect the average precision?
My images are all of the same size. Why would, at all, enable_auto_resize have effect?

Morganh · September 23, 2023, 9:23am

The enable_auto_resize parameter is to train with multiple resolutions images. Since your training images are all of the same size, it is not needed.
Seems that the objects are small, please refer to Frequently Asked Questions - NVIDIA Docs ,

In DetectNet_V2, are there any parameters that can help improve AP (average precision) on training small objects?

Following parameters can help you improve AP on smaller objects:

Increase num_layers of resnet

class_weight for small objects

Increase the coverage_radius_x and coverage_radius_y parameters of the bbox_rasterizer_config section for the small objects class

Decrease minimum_detection_ground_truth_overlap

Lower minimum_height to cover more small objects for evaluation.

nasserha · September 26, 2023, 7:42pm

Hi @Morganh. Thanks for your input.
I did actually tweaked these values. Things improved a bit but not as expected. Is there an official paper about the detectnetv2 explaining the mathematical/algorithmic meaning of these hyperparameters? Working with them without know what they mean is a bit like working in the dark.

Morganh · September 27, 2023, 6:05am

You can refer to user guide DetectNet_v2 - NVIDIA Docs and the source code.

 minimum_detection _ground_truth_overlap: Minimum IOU between ground truth and predicted box after clustering to call a valid detection. This parameter is a repeatable dictionary and a separate one must be defined for every class.
 minimum_height: Minimum height in pixels for a valid ground truth and prediction bbox.
 cov_radius_x (float): x-radius of the coverage ellipse

Also,
coverage_radius_x, https://github.com/NVIDIA/tao_tensorflow1_backend/blob/c7a3926ddddf3911842e057620bceb45bb5303cc/nvidia_tao_tf1/cv/detectnet_v2/evaluation/evaluation_config.py#L79.

More, please share your latest spec file and training log.
For your original training images, are they of the same resolution? What is the resolution?

nasserha · September 27, 2023, 11:23am

specs.txt (3.7 KB)
This is the last specs file.

The original (330) images are of resolution: 2048x1376. I do not change this resolution during the training.

The mAP with the above run looks like this:

nasserha · September 27, 2023, 11:55am

I think my current direction is to further tune these two parameters:

class_weight: would this related somehow to class frequency from the whole dataset? Does it have a particular effect if I have one class and background only? What does it mean in practice to make it bigger or smaller?
coverage_foreground_weight: this class probably makes sense in my case because my bounding boxes contain weeds ==> which means that the box itself also contains a lot of background. Rougly, my weeds leaves cover 50% of the bounding box. It is wise to use 0.5 instead of 0.05 (the default)?

Morganh · September 27, 2023, 2:03pm

Please share the training log as well.
Also, is it possible to share several training images and their labels? You can share with me by sending private message.

Morganh · September 28, 2023, 3:39am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Received the dataset. Some images are missing the objects. Suggest to label more and improve the label quality.
More, seems that this detection task is a bit difficult. Some images are difficult for human eyes to find the rumex object. The rumex looks very similar to the green background.
Suggest to use yolov4 and deeper backbone to train. Also, D-DETR and DINO can be considered as well.

system · October 17, 2023, 1:52am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Mean average precision too low on dimension (640*480) with (detectnetv2+Resnet18)? TAO Toolkit	2	720	October 12, 2021
Error detectnet_V2 train with TAO : dbscan_min_samples: 0.05' TAO Toolkit tao	4	381	November 7, 2023
training on small objects TAO Toolkit	2	499	October 12, 2021
Very low precision while Training detectnet_v2 model using custom data in TAO TAO Toolkit	13	991	May 4, 2023
tlt-train detectnet V2 mean average precision always 0 % in every target class TAO Toolkit	5	1050	October 12, 2021
Detectnet_v2(resnet50) low accuracy on 2 class dataset TAO Toolkit	25	897	February 12, 2023
Used the pascalvoc dataset to train with detectnet_V2, but the accuracy is low TAO Toolkit	15	584	July 6, 2022
Tao Training Detectnet_v2 custom dataset : Average precision value 0.0000% TAO Toolkit	5	194	June 25, 2024
Relationship between training dataset size and inference data size TAO Toolkit	12	680	February 22, 2022
Detectnet_v2 tlt ( training to detect person) TAO Toolkit	12	701	October 12, 2021

0.0 average precision during a detectnet_v2 training

Related topics