Training Custom Object detector with 6 classes

neophyte1 · October 7, 2019, 4:46am

Hi Guys,

I am training a custom object detection model (resnet-10 and detectnet_v2) for 6 classes using VOC/COCO dataset. I have convert these datasets to kitti data format, created the recorder, edited the spec file for multi-class detector. However, when I evaluate the trained model after 50 epochs, I do not get reliable average precision figures.

I am getting the following MAP results:

class name      average precision (in %)
------------  --------------------------
bicycle                         0
bus                             2.48739
car                             0
motorbike                       0.42388
person                          6.92905
truck                           0.442265

During training and evaluate I got the following message:

target/truncation is not updated to match the crop areaif the dataset contains target/truncation.

During evaluate, I got the following message:

One or more metadata field(s) are missing from ground_truth batch_data, and will be replaced with defaults: ['frame/camera_location']

Following is the statistic for number of data samples:

Number of images in the trainval set. 319492
Number of labels in the trainval set. 319492
Number of images in the test set. 7518

Kindly check a sample data format used for recorder conversion below below:

car 0.0 0 -1 141 50 500 330 -1 -1 -1 -1 -1 -1 -1

The spec file for training is as follows:

random_seed: 42
dataset_config {
  data_sources {
    tfrecords_path: "/workspace/tlt-experiments/tfrecords/kitti_trainval/*"
    image_directory_path: "/workspace/tlt-experiments/data/training"
  }
  image_extension: "jpg"
  
  target_class_mapping {
    key: "person"
    value: "person"
  }

  target_class_mapping {
    key: "bicycle"
    value: "bicycle"
  }
  target_class_mapping {
    key: "car"
    value: "car"
  }
  target_class_mapping {
    key: "motorbike"
    value: "motorbike"
  }
  target_class_mapping {
    key: "bus"
    value: "bus"
  }
  target_class_mapping {
    key: "truck"
    value: "truck"
  }
  validation_fold: 0
}
augmentation_config {
  preprocessing {
    output_image_width: 1248
    output_image_height: 384
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  }
  spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.20000000298
    contrast_scale_max: 0.10000000149
    contrast_center: 0.5
  }
}
postprocessing_config {
  target_class_config {
    key: "person"
    value {
      clustering_config {
        coverage_threshold:0.00749999983236
        dbscan_eps: 0.230000004172
        dbscan_min_samples: 0.0500000007451
        minimum_bounding_box_height: 20
      }
    }
  }
  target_class_config {
    key: "bicycle"
    value {
      clustering_config {
        coverage_threshold: 0.00499999988824
        dbscan_eps: 0.15000000596
        dbscan_min_samples: 0.0500000007451
        minimum_bounding_box_height: 20
      }
    }
  }
  target_class_config {
    key: "car"
    value {
      clustering_config {
        coverage_threshold: 0.00499999988824
        dbscan_eps:  0.20000000298
        dbscan_min_samples: 0.0500000007451
        minimum_bounding_box_height: 20
      }
    }
  }

  target_class_config {
    key: "motorbike"
    value {
      clustering_config {
        coverage_threshold: 0.00749999983236
        dbscan_eps: 0.230000004172
        dbscan_min_samples: 0.0500000007451
        minimum_bounding_box_height: 20
      }
    }
  }
  target_class_config {
    key: "bus"
    value {
      clustering_config {
        coverage_threshold: 0.00749999983236
        dbscan_eps: 0.230000004172
        dbscan_min_samples: 0.0500000007451
        minimum_bounding_box_height: 20
      }
    }
  }

  target_class_config {
    key: "truck"
    value {
      clustering_config {
        coverage_threshold: 0.00749999983236
        dbscan_eps: 0.230000004172
        dbscan_min_samples: 0.0500000007451
        minimum_bounding_box_height: 20
      }
    }
  }


}
model_config {
  pretrained_model_file: "/workspace/tlt-experiments/pretrained_resnet10/tlt_resnet10_detectnet_v2_v1/resnet10.hdf5"
  num_layers: 10
  use_batch_norm: true
  activation {
    activation_type: "relu"
  }
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    }
    cov {
    }
  }
  training_precision {
    backend_floatx: FLOAT32
  }
  arch: "resnet"
}
evaluation_config {
  validation_period_during_training: 10
  first_validation_epoch: 1
  minimum_detection_ground_truth_overlap {
    key: "person"
    value: 0.5
  }
  minimum_detection_ground_truth_overlap {
    key: "bicycle"
    value: 0.5
  }
  minimum_detection_ground_truth_overlap {
    key: "car"
    value: 0.699999988079
  }
  minimum_detection_ground_truth_overlap {
    key: "motorbike"
    value: 0.5
  }
  minimum_detection_ground_truth_overlap {
    key: "bus"
    value: 0.5
  }
  minimum_detection_ground_truth_overlap {
    key: "truck"
    value: 0.5
  }

  evaluation_box_config {
    key: "person"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }
  evaluation_box_config {
    key: "bicycle"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }
  evaluation_box_config {
    key: "car"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }
  evaluation_box_config {
    key: "motorbike"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }

  evaluation_box_config {
    key: "bus"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }

  evaluation_box_config {
    key: "truck"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  } 
    

  average_precision_mode: INTEGRATE
}
cost_function_config {
  target_classes {
    name: "person"
    class_weight: 4.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  target_classes {
    name: "bicycle"
    class_weight: 8.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 1.0
    }
  }
  target_classes {
    name: "car"
    class_weight: 1.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  target_classes {
    name: "motorbike"
    class_weight: 8.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 1.0
    }
  }

  target_classes {
    name: "bus"
    class_weight: 8.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 1.0
    }
  }

  target_classes {
    name: "truck"
    class_weight: 8.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 1.0
    }
  }

  enable_autoweighting: true
  max_objective_weight: 0.999899983406
  min_objective_weight: 9.99999974738e-05
}
training_config {
  batch_size_per_gpu: 16
  num_epochs: 250
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 5e-06
      max_learning_rate: 5e-04
      soft_start: 0.10000000149
      annealing: 0.699999988079
    }
  }
  regularizer {
    type: L1
    weight: 3.00000002618e-09
  }
  optimizer {
    adam {
      epsilon: 9.99999993923e-09
      beta1: 0.899999976158
      beta2: 0.999000012875
    }
  }
  cost_scaling {
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
  }
  checkpoint_interval: 10
}
bbox_rasterizer_config {
  target_class_config {
    key: "person"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }
  target_class_config {
    key: "bicycle"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }
  target_class_config {
    key: "car"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 0.40000000596
      cov_radius_y: 0.40000000596
      bbox_min_radius: 1.0
    }
  }

  target_class_config {
    key: "motorbike"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }

  target_class_config {
    key: "bus"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }

  target_class_config {
    key: "truck"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }

  deadzone_radius: 0.400000154972
}

Kindly let me know where I am going wrong. Please help me out.

Thanks.

AastaLLL · October 7, 2019, 9:42am

Hi,

One or more metadata field(s) are missing from ground_truth batch_data, and will be replaced with defaults: ['frame/camera_location']

It looks like that the TLT toolkit cannot find the corresponding ground_truth for your evaluation dataset.
Could you check it first?

Thanks.

neophyte1 · October 7, 2019, 10:05am

Hi AastaLLL,

I have replaced all the parameters with the value -1. From the documentation, I could understand that only the class name and bounding box corners (xmin , ymin, xmax , ymax) need to be provided.

Also, training and validation data are combined together with training data. Does that mean even training might be having some issues because of this ?

Kindly help me out if other ground truth data is required.

Thanks.

Morganh · October 8, 2019, 9:28am

Hi neophyte1,
Could you please check “Model Requirements” at Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation ?

For DetectNet_v2,the tlt-train tool does not support training on images of multiple resolutions, or resizing images during training. All of the images must be resized offline to the final training size and the corresponding bounding boxes must be scaled accordingly.

neophyte1 · October 9, 2019, 4:19am

Hi Morganh,

Thanks for your help. I checked the requirement for the same image size in the sample dataset pointed to for running the DetectNet_v2 sample. The images do not seem to be of the same size. For example,

training/image_2/000000.png - 1224x370
training/image_2/000001.png - 1242x375

How has the resizing been done if any ? Since we provide these images directly for recorder generation, it implies that images of different resolution are being passed to training tool.

Can you please help me understand ?

Thanks.

Morganh · October 9, 2019, 7:14am

Hi noephyte1,
The KITTI dataset(1242x375,1238x374,1224x370,1241x376) almost matches spec (1248,384) but not exactly. During training, there is a crop step to crop them into the same size.If original image is smaller than model input size, then crop will become padding.

But you mentioned that your dataset is VOC dataset and COCO dataset(640x480).It is far away from (1248,384).
So for detectnet_v2, please resize them offline to the final training size.

neophyte1 · October 9, 2019, 10:54am

Hi Morganh,

Where do we mention final training size ? Can we change (1248,384) to (480,480) for example? As advised, I am resizing all my images to 480x480 and then feeding it for training.

I made the change in the train config file as below :

augmentation_config {
  preprocessing {
    output_image_width: 480
    output_image_height: 480
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  }
  spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.20000000298
    contrast_scale_max: 0.10000000149
    contrast_center: 0.5
  }
}

Is this correct?

Kindly let me know.

Thanks.

Morganh · October 9, 2019, 11:49am

Yes,you can. Looks ok.W>=480,H>=272 and W,H are multiples 16.

Morganh · October 9, 2019, 11:55am

Also,as tlt doc mentioned,the corresponding bounding boxes need be scaled accordingly.

neophyte1 · October 14, 2019, 4:11am

Hi Morganh,

Thanks for your help. After training the model for 50 epochs, I get bizarre results. Only for car class I get a significantly low precision. Please find the results of evaluation below:

Validation cost: 0.000277
Mean average_precision (in %): 27.2255

class name      average precision (in %)
------------  --------------------------
bicycle                         18.8004
bus                             45.8717
car                              6.98904
motorbike                       28.2466
person                          44.9052
truck                           18.54

Median Inference Time: 0.004990

Please find my training spec file below:

random_seed: 42
dataset_config {
  data_sources {
    tfrecords_path: "/workspace/tlt-experiments/tfrecords/kitti_trainval/*"
    image_directory_path: "/workspace/tlt-experiments/data/training"
  }
  image_extension: "jpg"
  
  target_class_mapping {
    key: "person"
    value: "person"
  }

  target_class_mapping {
    key: "bicycle"
    value: "bicycle"
  }
  target_class_mapping {
    key: "car"
    value: "car"
  }
  target_class_mapping {
    key: "motorbike"
    value: "motorbike"
  }
  target_class_mapping {
    key: "bus"
    value: "bus"
  }
  target_class_mapping {
    key: "truck"
    value: "truck"
  }
  validation_fold: 0
}
augmentation_config {
  preprocessing {
    output_image_width: 480
    output_image_height: 480
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  }
  spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.20000000298
    contrast_scale_max: 0.10000000149
    contrast_center: 0.5
  }
}
postprocessing_config {
  target_class_config {
    key: "person"
    value {
      clustering_config {
        coverage_threshold:0.00749999983236
        dbscan_eps: 0.230000004172
        dbscan_min_samples: 0.0500000007451
        minimum_bounding_box_height: 20
      }
    }
  }
  target_class_config {
    key: "bicycle"
    value {
      clustering_config {
        coverage_threshold: 0.00499999988824
        dbscan_eps: 0.15000000596
        dbscan_min_samples: 0.0500000007451
        minimum_bounding_box_height: 20
      }
    }
  }
  target_class_config {
    key: "car"
    value {
      clustering_config {
        coverage_threshold: 0.00499999988824
        dbscan_eps:  0.20000000298
        dbscan_min_samples: 0.0500000007451
        minimum_bounding_box_height: 20
      }
    }
  }

  target_class_config {
    key: "motorbike"
    value {
      clustering_config {
        coverage_threshold: 0.00749999983236
        dbscan_eps: 0.230000004172
        dbscan_min_samples: 0.0500000007451
        minimum_bounding_box_height: 20
      }
    }
  }
  target_class_config {
    key: "bus"
    value {
      clustering_config {
        coverage_threshold: 0.00749999983236
        dbscan_eps: 0.230000004172
        dbscan_min_samples: 0.0500000007451
        minimum_bounding_box_height: 20
      }
    }
  }

  target_class_config {
    key: "truck"
    value {
      clustering_config {
        coverage_threshold: 0.00749999983236
        dbscan_eps: 0.230000004172
        dbscan_min_samples: 0.0500000007451
        minimum_bounding_box_height: 20
      }
    }
  }


}
model_config {
  pretrained_model_file: "/workspace/tlt-experiments/pretrained_resnet10/tlt_resnet10_detectnet_v2_v1/resnet10.hdf5"
  num_layers: 10
  use_batch_norm: true
  activation {
    activation_type: "relu"
  }
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    }
    cov {
    }
  }
  training_precision {
    backend_floatx: FLOAT32
  }
  arch: "resnet"
}
evaluation_config {
  validation_period_during_training: 10
  first_validation_epoch: 1
  minimum_detection_ground_truth_overlap {
    key: "person"
    value: 0.5
  }
  minimum_detection_ground_truth_overlap {
    key: "bicycle"
    value: 0.5
  }
  minimum_detection_ground_truth_overlap {
    key: "car"
    value: 0.699999988079
  }
  minimum_detection_ground_truth_overlap {
    key: "motorbike"
    value: 0.5
  }
  minimum_detection_ground_truth_overlap {
    key: "bus"
    value: 0.5
  }
  minimum_detection_ground_truth_overlap {
    key: "truck"
    value: 0.5
  }

  evaluation_box_config {
    key: "person"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }
  evaluation_box_config {
    key: "bicycle"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }
  evaluation_box_config {
    key: "car"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }
  evaluation_box_config {
    key: "motorbike"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }

  evaluation_box_config {
    key: "bus"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }

  evaluation_box_config {
    key: "truck"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  } 
    

  average_precision_mode: INTEGRATE
}
cost_function_config {
  target_classes {
    name: "person"
    class_weight: 4.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  target_classes {
    name: "bicycle"
    class_weight: 8.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 1.0
    }
  }
  target_classes {
    name: "car"
    class_weight: 1.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  target_classes {
    name: "motorbike"
    class_weight: 8.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 1.0
    }
  }

  target_classes {
    name: "bus"
    class_weight: 8.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 1.0
    }
  }

  target_classes {
    name: "truck"
    class_weight: 8.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 1.0
    }
  }

  enable_autoweighting: true
  max_objective_weight: 0.999899983406
  min_objective_weight: 9.99999974738e-05
}
training_config {
  batch_size_per_gpu: 16
  num_epochs: 50
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 5e-06
      max_learning_rate: 5e-04
      soft_start: 0.10000000149
      annealing: 0.699999988079
    }
  }
  regularizer {
    type: L1
    weight: 3.00000002618e-09
  }
  optimizer {
    adam {
      epsilon: 9.99999993923e-09
      beta1: 0.899999976158
      beta2: 0.999000012875
    }
  }
  cost_scaling {
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
  }
  checkpoint_interval: 10
}
bbox_rasterizer_config {
  target_class_config {
    key: "person"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }
  target_class_config {
    key: "bicycle"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }
  target_class_config {
    key: "car"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }

  target_class_config {
    key: "motorbike"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }

  target_class_config {
    key: "bus"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }

  target_class_config {
    key: "truck"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }

  deadzone_radius: 0.400000154972
}

What do you suspect could be the issue?

Thanks.

neophyte1 · October 15, 2019, 8:44am

Hi Morganh,

How to change the parameters of the training config file based on the training input image size ? Is there any documentation explaining how to customize the parameters? Probably that is the reason for the terrible performance of detector for some classes.

Thanks.

Morganh · October 15, 2019, 9:44am

Hi neophyte1,
What’s the car’s ratio in your dataset?

neophyte1 · October 15, 2019, 10:13am

Hi Morganh,

Following are the statistics as reported while creating tfrecords. The total number is also in the same ratio:

bicycle: 4305
motorbike: 5417
car: 26842
person: 154889
truck: 5931
bus: 3699

Thanks

Morganh · October 15, 2019, 4:21pm

Hi neophyte1,
Thanks for the info. Is it possbile to narrow down the low map for car via more experiments?

Could you retrain with batch size 4 and epochs 120? Your bs is 16.
or 2) Train only 3 classes: person/bicycle/car
or 3) change the (480,480) to other resolution.

More, could you check the correctness of all the labels? And make sure the data and label are matched.

neophyte1 · October 15, 2019, 9:57pm

Hi Morganh,

Thanks for the pointers.

I am using multiple GPUs - 4 of them with batch size 16 per GPU currently. I have tried with batch size 6 per GPU as well. However, the results were not improving. Should I try with batch size 4 per GPU with 4 GPUs or overall batch size of 4 ?
Yesterday, I clubbed motorbike and bicycle class to cyclist as given in the example config and car, bus and truck to car using class mapping in the config. I used the same config parameters as given in the sample. However, the performance deteriorated. Please note that with sample KITTI dataset, the mean average precision is quite high for all 3 classes. I will try just using bicycle, person and car without clubbing and will let you know the results.
Should I try (640,480) or (720,480) as in the sample the size is (1248,384). May be I should not feed square input size?

I will recheck correctness of all labels. However, I have visualized multiple times to make sure the data and labels are matched. I can upload some sample images and labels if you wish to cross check. Please let me know.

Thanks.

neophyte1 · October 16, 2019, 6:35am

Hi Morganh,

Should the “load_graph” parameter be set to true or false in training config file ?

Please let me know.

Thanks.

Morganh · October 16, 2019, 7:03am

Hi neophyte1,
More pointers are as below. You can do more experiments via one pointer or several.

Could you check raw image size from your VOC/COCO dataset? And calculate what is raw image aspect ratio? If training spec’s width/height changes too much for size and aspect ratio, the result will not be good.
Does the dataset have a lot of small cars and trucks? If the targets are small, may expect small AP.
In your spec, car’s class_weight is too small.Expect to increase weight.
Person class_weight 4 , bbox weight 10
Bicycle class_weight 8 , bbox weight 1
Car class_weight 1 , bbox weight 10
Motorbike class_weight 8 , bbox weight 1
Bus class_weight 8 , bbox weight 1
Truck class_weight 8 , bbox weight 1
minimum_bounding_box_height: 20
Could it reduce to 10? if there are a lot of small targets, this filters out them.
minimum_detection_ground_truth_overlap {
key: “car”
value: 0.699999988079
}
Car IoU threshold is 0.7 while all others is 0.5 during evaluation.

More,we have no explicit guidance in the doc about how to tune hyper-parameters. More experiments are expected.The “load_graph” cat set to false by default in training config file. But for a pruned, please remember to set this parameter as True. See tlt doc for details.

neophyte1 · October 17, 2019, 4:56am

Hi Morganh,

Thanks for the pointers. Many of the pointers worked. However, I still have doubts regarding the parameter of batch size. Following are some of the results I performed on just VOC dataset for 3 classes for 22 epochs with default configuration:

No. of GPUs : 4 
Batch Size per GPU : 4
Average Precision (%):

bicycle : 3.61833
car : 0
person : 12.6573

No. of GPUs : 1
Batch Size per GPU : 4
Average Precision (%):

bicycle : 30
car : 0.5
person : 28

Please do not focus on precision of car class as I did not tweak the parameters as suggested by you for “car” class in this experiment. I have fixed the precision for car class by adding more data from coco dataset and tweaking the parameters as suggested by you.

Kindly let me know if these observations seem correct. If the results are to be believed, then I have the following queries:

How to make the training work for multiple gpus ?
How to make the training work for greater batch size per gpu ?

Please let me know and thanks for the pointers again.

Thanks.

Morganh · October 19, 2019, 4:22pm

Hi neophyte1,
1)See Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation ,
tlt-train command supports multiGPU training. You can invoke a multi GPU training session by using the --gpus N option, where N is the number of GPUs you want to use. N must be less than the number of GPUs available in the given node for training.
2) batch_size_per_gpu can be configured in spec file.

neophyte1 · October 21, 2019, 12:15pm

Hi Morganh,

Let me update you with my progress. I am first trying to achieve accuracy. Hence, I opted for Resnet-18 backbone. After training, I got really impressive results after following your guidelines. However, I do not understand how to prune and retrain for Resnet-18 backbone for my dataset. Somehow, pruning and retraining was successful with Resnet-10 backbone using the parameter of “prune threshold” set in the example. When I use the same value for pruning and retraining Resnet-18 model I get terrible results. Following are the results of pruning and retraining using pth = 5.2e-6.

Results of training :

Validation cost: 0.002584
Mean average_precision (in %): 30.4075

class name      average precision (in %)
------------  --------------------------
bicycle                          10.2874
car                              32.4107
person                           48.5244

Median Inference Time: 0.007108

Results after pruning and retraining:

class name      average precision (in %)
------------  --------------------------
bicycle                          0
car                              1.67
person                           25.30

Please help me out to set the right parameters.

Thanks.

Topic		Replies	Views
Evaluate Trained models in Tao toolkit TAO Toolkit	37	1340	July 5, 2022
TLT Detectnet with Standford Drone Dataset Low Average Precision TAO Toolkit	18	762	October 12, 2021
Too many false positives. TAO Toolkit	33	2315	October 12, 2021
unkown error by horovod TAO Toolkit	15	1634	October 12, 2021
Invalid argument: Invalid JPEG data or crop window, data size 786432 TAO Toolkit	9	1373	March 20, 2023
Detectnet v2 training :: very low or zero precision TAO Toolkit	4	654	April 17, 2023
PeopleNet precision low for person class TAO Toolkit	22	1767	October 12, 2021
Error while using Tlt-infer TAO Toolkit	6	693	October 12, 2021
Tlt detectnet training focusing on a particular class? TAO Toolkit	16	1305	October 12, 2021
Cannot train custom model, IndexError: list index (0) out of range TAO Toolkit	4	413	June 27, 2023

Training Custom Object detector with 6 classes

Related topics