Incorrect bounding box of detectnet_v2-darknet-53 in the inference phase

Hi all,
I trained detectnet-v2-darknet-53 on the helmet dataset with GPU 1080 ti and batch-size=4. The trained map result reach to 80%, and that shows the model fine trained, but when I want to get inference test on some tests images, the bounding box of output are incorrected and shifted. Note that this problem only occurred with detectnet-v2-darknet-53, I get correct result with yolov3/ssd/darknet-v1.

I used TLT docker version v2-py3.

Why this problem happen? How to I can solve the problem? This related to config file or bug in the TLT version.


1 Like

Do you run with tlt-infer or deepstream?
Please share the command and config file too.

@Morganh,
I run tlt-infer.

tlt-infer detectnet_v2 -e /workspace/tmp2/detectnet_v2/specs/detectnet_v2_inference_kitti_etlt.txt \
                        -o /workspace/tmp2/output \
                        -i /workspace/tmp2/trainval/image \
                        -k key


random_seed: 42
dataset_config {
  data_sources {
    tfrecords_path: "/workspace/tf_records/*"
    image_directory_path: "/workspace/dataset/training"
  }
  image_extension: "jpg"
  target_class_mapping {
    key: "hat"
    value: "hat"
  }
  target_class_mapping {
    key: "head"
    value: "head"
  }
  validation_fold: 0
}
augmentation_config {
  preprocessing {
    output_image_width: 1248
    output_image_height: 384
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  }
  spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.20000000298023224
    contrast_scale_max: 0.10000000149011612
    contrast_center: 0.5
  }
}
postprocessing_config {
  target_class_config {
    key: "hat"
    value {
      clustering_config {
        coverage_threshold: 0.004999999888241291
        dbscan_eps: 0.15000000596046448
        dbscan_min_samples: 0.05000000074505806
        minimum_bounding_box_height: 1
      }
    }
  }
  target_class_config {
    key: "head"
    value {
      clustering_config {
        coverage_threshold: 0.004999999888241291
        dbscan_eps: 0.12999999523162842
        dbscan_min_samples: 0.05000000074505806
        minimum_bounding_box_height: 1
      }
    }
  }
}
model_config {
  pretrained_model_file: "/workspace/pretrained_model/darknet53.hdf5"
  num_layers: 53
  use_batch_norm: true
  dropout_rate: 0.10000000149011612
  activation {
    activation_type: "relu"
  }
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    }
    cov {
    }
  }
  training_precision {
    backend_floatx: FLOAT32
  }
  freeze_blocks: 0.0
  arch: "darknet"
  all_projections: true
}
evaluation_config {
  validation_period_during_training: 1
  first_validation_epoch: 1
  minimum_detection_ground_truth_overlap {
    key: "hat"
    value: 0.5
  }
  minimum_detection_ground_truth_overlap {
    key: "head"
    value: 0.699999988079071
  }
  evaluation_box_config {
    key: "hat"
    value {
      minimum_height: 4
      maximum_height: 9999
      minimum_width: 4
      maximum_width: 9999
    }
  }
  evaluation_box_config {
    key: "head"
    value {
      minimum_height: 4
      maximum_height: 9999
      minimum_width: 4
      maximum_width: 9999
    }
  }
}
cost_function_config {
  target_classes {
    name: "head"
    class_weight: 1.0
    coverage_foreground_weight: 0.05000000074505806
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  target_classes {
    name: "hat"
    class_weight: 1.0
    coverage_foreground_weight: 0.05000000074505806
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 10.0
    }
  }
  enable_autoweighting: true
  max_objective_weight: 0.9998999834060669
  min_objective_weight: 9.999999747378752e-05
}
training_config {
  batch_size_per_gpu: 4
  num_epochs: 90
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 4.999999873689376e-06
      max_learning_rate: 0.0005000000237487257
      soft_start: 0.009999999776482582
      annealing: 0.30000001192092896
    }
  }
  regularizer {
    type: L1
    weight: 3.000000026176508e-09
  }
  optimizer {
    adam {
      epsilon: 9.99999993922529e-09
      beta1: 0.8999999761581421
      beta2: 0.9990000128746033
    }
  }
  cost_scaling {
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
  }
  checkpoint_interval: 1
}
bbox_rasterizer_config {
  target_class_config {
    key: "hat"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 0.4000000059604645
      cov_radius_y: 0.4000000059604645
      bbox_min_radius: 1.0
    }
  }
  target_class_config {
    key: "head"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 0.4000000059604645
      cov_radius_y: 0.4000000059604645
      bbox_min_radius: 1.0
    }
  }
  deadzone_radius: 0.6700000166893005
}

Could you share the tlt infer config file too? Thanks.

Hi, @Morganh,
experiment_spec.txt (4.1 KB)

Hi @LoveNvidia,
Your attachment is training spec. I mean could you share the tlt infer spec file here?
For example, the detectnet_v2_inference_kitti_tlt.txt or detectnet_v2_inference_kitti_etlt.txt.

Also, please share your full command and full log when you run tlt-infer. Thanks.

@Morganh,

infer_detectnetv2_z.txt (1.7 KB)

tlt-infer detectnet_v2 -e /workspace/spec_files/infer_detectnetv2_z.txt -o /workspace/inferred_images/test_detectnet2 -i /workspace/dataset/testing/images -k key

infer_detectnetv2_log.txt (7.0 KB)

Q- I trained the detectnet-2/ssd/yolo with same backbone, and their map of both ssd/yolo are reached to 80% but detectnet2 achieved 63%, I want to know this problem related to model detector or process of training? My mean is that really the detectnet2 has poor performance against ssd/yolo?

I trained detectnet-v2-darknet-53 on the helmet dataset with GPU 1080 ti and batch-size=4. The trained map result reach to 80%,

I’m sorry, I made a mistake here. That model reached to 63%

1 Like

What is darknet-v1? Could you provide more details?
Can you share the training spec of yolo_v3 and ssd?

Could you please share the helmet dataset? Is it public dataset? I want to check on my side too.

For the “the bounding box of output are incorrected and shifted”, is it 100% reproduced on all the test images?

Hi LoveNvidia,

Is this still an issue to support? Any update?

@kayccc,
No thanks, the problem is solved with training again.