Detectnet_v2 tlt ( training to detect person)

sylia · June 14, 2021, 10:36am

Hello,

Please, when i try to train model detectnet_v2

tlt detectnet_v2 train -e /workspace/tlt-experiments/Data/Work/resnet18/config/detectnet_v2_train_resnet18_kitti.txt -r /workspace/tlt-experiments/Data/Work/resnet18/train -k tlt_encode -n resnet18_detector --gpus 1

i get this error, i don’t know why all configuration was done !

Traceback (most recent call last):
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 797, in
File “”, line 2, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/utilities/timer.py”, line 46, in wrapped_fn
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 790, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 691, in run_experiment
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 599, in train_gridbox
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 454, in build_training_graph
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/model/detectnet_model.py”, line 573, in build_training_graph
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/model/detectnet_model.py”, line 534, in _cost_func
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py”, line 227, in cost_combiner_func
AssertionError
Traceback (most recent call last):
File “/usr/local/bin/detectnet_v2”, line 8, in
sys.exit(main())
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/entrypoint/detectnet_v2.py”, line 12, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py”, line 296, in launch_job
AssertionError: Process run failed.
2021-06-14 12:28:53,051 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

detectnet_v2_train_resnet18_kitti.txt (3.0 KB)

muhammadrizwanmunawar · June 14, 2021, 11:03am

@sylia , if you want to detect only person, then you need to modify your config file. I modified it, Below is the correct file, you can use it.

random_seed: 42
dataset_config {
  data_sources {
    tfrecords_path: "/workspace/tlt-experiments/dataa/tfrecords/*"
    image_directory_path: "/workspace/tlt-experiments/dataa/training"
  }
  image_extension: "png"

  
  target_class_mapping {
    key: "person"
    value: "person"
  }
  
  validation_fold: 0
}
augmentation_config {
  preprocessing {
    output_image_width: 1248
    output_image_height: 384
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  }
  spatial_augmentation {
    hflip_probability: 0.5
    zoom_min: 1.0
    zoom_max: 1.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.20000000298
    contrast_scale_max: 0.10000000149
    contrast_center: 0.5
  }
}

postprocessing_config {
  target_class_config {
    key: "person"
    value {
      clustering_config {
        clustering_algorithm: DBSCAN
        dbscan_confidence_threshold: 0.9
        coverage_threshold: 0.00749999983236
        dbscan_eps: 0.230000004172
        dbscan_min_samples: 0.0500000007451
        minimum_bounding_box_height: 20
      }
    }
  }
}
model_config {
  pretrained_model_file: "/workspace/tlt-experiments/Data/Work/resnet18/model/resnet18.hdf5"
  num_layers: 18
  use_batch_norm: true
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    }
    cov {
    }
  }
  training_precision {
    backend_floatx: FLOAT32
  }
  arch: "resnet"
}
evaluation_config {
  validation_period_during_training: 10
  first_validation_epoch: 30

  minimum_detection_ground_truth_overlap {
    key: "person"
    value: 0.5
  }
  
  evaluation_box_config {
    key: "person"
    value {
      minimum_height: 20
      maximum_height: 9999
      minimum_width: 10
      maximum_width: 9999
    }
  }
  average_precision_mode: INTEGRATE
}
cost_function_config {
  
  target_classes {
    name: "person"
    class_weight: 4.0
    coverage_foreground_weight: 0.0500000007451
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
 
  }
  enable_autoweighting: true
  max_objective_weight: 0.999899983406
  min_objective_weight: 9.99999974738e-05
}
training_config {
  batch_size_per_gpu: 4
  num_epochs: 120
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 5e-06
      max_learning_rate: 5e-04
      soft_start: 0.10000000149
      annealing: 0.699999988079
    }
  }
  regularizer {
    type: L1
    weight: 3.00000002618e-09
  }
  optimizer {
    adam {
      epsilon: 9.99999993923e-09
      beta1: 0.899999976158
      beta2: 0.999000012875
    }
  }
  cost_scaling {
    initial_exponent: 20.0
    increment: 0.005
    decrement: 1.0
  }
  checkpoint_interval: 10
}
bbox_rasterizer_config {
 
  target_class_config {
    key: "person"
    value {
      cov_center_x: 0.5
      cov_center_y: 0.5
      cov_radius_x: 1.0
      cov_radius_y: 1.0
      bbox_min_radius: 1.0
    }
  }
  deadzone_radius: 0.400000154972
}

Also, What is your image width, height?

sylia · June 14, 2021, 11:56am

output_image_width: 1242
output_image_height: 375

but don’t works !

muhammadrizwanmunawar · June 14, 2021, 12:35pm

If your image width, height is 1242*768 then replace below parameters in specs file
output_image_width: 1242
output_image_height: 375

sylia · June 14, 2021, 12:40pm

muhammadrizwanmunawar · June 14, 2021, 12:50pm

What is your batch size? can you share training specs?

sylia · June 14, 2021, 1:07pm

detectnet_v2_train_resnet18_kitti.txt (2.9 KB)

Morganh · June 14, 2021, 3:37pm

Where and how did you download pretrained_model_file /workspace/tlt-experiments/Data/Work/resnet18/model/resnet18.hdf5 ?

Morganh · June 14, 2021, 3:41pm

Your cost_function_config is missing “bbox” part.
Please refer to DetectNet_v2 — Transfer Learning Toolkit 3.0 documentation

sylia · June 14, 2021, 3:51pm

ngc registry model download-version nvidia/tlt_pretrained_detectnet_v2:resnet18

Morganh · June 14, 2021, 3:52pm

As mentioned above, please modify cost_function_config.

sylia · June 14, 2021, 4:07pm

thanks

Topic		Replies	Views
Error on tlt-training detectnet_v2? TAO Toolkit	6	540	October 12, 2021
Training with TLT a detectnet_v2 resnet18 pre-trained model failed TAO Toolkit	2	653	October 12, 2021
Problem training resnet10+detectnet_v2 for multiple classes TAO Toolkit	2	782	October 12, 2021
Using detectnet_v2 pretrained models in TLT v3.0 TAO Toolkit	11	991	October 27, 2021
Error while traininig detectnet_v2 with mobilenet_v2 backbone TAO Toolkit	6	705	October 12, 2021
Detectnetv2 wont train if pretrained_model_file is specified. Peoplenet transfer learning TAO Toolkit	11	1087	December 28, 2021
AssertionError while training model in TLT TAO Toolkit	3	893	October 12, 2021
Tlt-infer detectnet_v2 fails - TypeError TAO Toolkit	37	1651	October 12, 2021
TLT Detectnet TrafficCamNet training not working TAO Toolkit	10	2568	October 12, 2021
DetectNet v2 training error - "ValueError: The zipfile extracted was corrupt. Please check your key " TAO Toolkit	2	1029	October 12, 2021

Detectnet_v2 tlt ( training to detect person)

Related topics