Error detectnet_V2 train with TAO : dbscan_min_samples: 0.05'

erence · October 17, 2023, 8:51am

I try to train custom detectnet_v2 model for one class on an AMDx64 RTX2070

I run on nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5 container with “#detectnet_v2 train -r /tao/results -e /tao/detectnet_train_cfg.txt”

and the detectnet_train_cfg.txt is as below:

Question: 1- Where in my config file might be the problem?
2- Is it ok to comment out the pretrained model, or do I have to add a pretrained model, if yes, my picture size is 1920x1200, is it still ok?
3- train on 1920x1200 pics. batch lowered to 2… is it ok?

#####################################################

Model Config

model_config {
arch: “resnet”
//pretrained_model_file: <path_to_model_file>
freeze_blocks: 0
freeze_blocks: 1
all_projections: True
num_layers: 18
use_pooling: False
use_batch_norm: True
dropout_rate: 0.0
objective_set: {
cov {}
bbox {
scale: 35.0
offset: 0.5
}
}
}

BBox Ground Truth Generator

bbox_rasterizer_config {
target_class_config {
key: “eye”
value: {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.4
cov_radius_y: 0.4
bbox_min_radius: 1.0
}
}
deadzone_radius: 0.67
}

Post-Processor

postprocessing_config {
target_class_config {
key: “eye”
value: {
clustering_config {
coverage_threshold: 0.005
dbscan_eps: 0.15
dbscan_min_samples: 0.05
minimum_bounding_box_height: 20
}
}
}
}

Cost Function

cost_function_config {
target_classes {
name: “eye”
class_weight: 1.0
coverage_foreground_weight: 0.05
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
enable_autoweighting: True
max_objective_weight: 0.9999
min_objective_weight: 0.0001
}

Trainer

training_config {
batch_size_per_gpu: 2
num_epochs: 80
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-6
max_learning_rate: 5e-4
soft_start: 0.1
annealing: 0.7
}
}
regularizer {
type: L1
weight: 3e-9
}
optimizer {
adam {
epsilon: 1e-08
beta1: 0.9
beta2: 0.999
}
}
cost_scaling {
enabled: False
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
visualizer {
enabled: true
num_images: 3
scalar_logging_frequency: 10
infrequent_logging_frequency: 1
target_class_config {
key: “eye”
value: {
coverage_threshold: 0.005
}
}
target_class_config {
key: “pedestrian”
value: {
coverage_threshold: 0.005
}
}
}
}

Augmentation Module

augmentation_config {
preprocessing {
output_image_width: 1200
output_image_height: 1920
output_image_channel: 3
min_bbox_width: 1.0
min_bbox_height: 1.0
}
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
color_shift_stddev: 0.0
hue_rotation_max: 25.0
saturation_shift_max: 0.2
contrast_scale_max: 0.1
contrast_center: 0.5
}
}

Configuring the Evaluator

evaluation_config {
average_precision_mode: INTEGRATE
validation_period_during_training: 10
first_validation_epoch: 1
minimum_detection_ground_truth_overlap {
key: “eye”
value: 0.7
}

evaluation_box_config {
key: “eye”
value {
minimum_height: 4
maximum_height: 9999
minimum_width: 4
maximum_width: 9999
}
}
}

Dataloader

dataset_config {
data_sources: {
tfrecords_path: “tao/tfrecords”
image_directory_path: “tao/data/kitti_face”
}
image_extension: “jpg”
target_class_mapping {
key: “eye”
value: “eye”
}
validation_fold: 0
}

Inferencer

inferencer_config{

target_classes: “eye”

image_width: 1200
image_height: 1920

image_channels: 3
batch_size: 2
gpu_index: 0

tensorrt_config{
parser: ETLT
etlt_model: “/tao/model.etlt”
backend_data_type: INT8
save_engine: true
trt_engine: “/tao”
calibrator_config{
calibration_cache: “/tao”
n_batches: 10
batch_size: 16
}
}
}

Bbox Handler

bbox_handler_config{
kitti_dump: true
disable_overlay: false
overlay_linewidth: 2
classwise_bbox_handler_config{
key:“eye”
value: {
confidence_model: “aggregate_cov”
output_map: “eye”
bbox_color{
R: 0
G: 255
B: 0
}
clustering_config{
coverage_threshold: 0.005
dbscan_eps: 0.3
dbscan_min_samples: 0.05
dbscan_confidence_threshold: 0.9
minimum_bounding_box_height: 4
}
}
}
classwise_bbox_handler_config{
key:“default”
value: {
confidence_model: “aggregate_cov”
bbox_color{
R: 255
G: 0
B: 0
}
clustering_config{
coverage_threshold: 0.005
dbscan_eps: 0.3
dbscan_min_samples: 0.05
dbscan_confidence_threshold: 0.9
minimum_bounding_box_height: 4
}
}
}
}

Morganh · October 17, 2023, 9:43am

Please refer to How to set the dbscan_min_samples parameter in facenet notebook - #4 by Morganh.

erence · October 17, 2023, 11:04am

Thanks. Changed the dbscan_min_samples to 1 and that error was solved.

Now I receive “ValueError: steps_per_epoch must be > 0” error.
data sources are listable and seem right.

Spec file is as below:
Any suggestions??
######################

random_seed: 42
dataset_config {
data_sources {
tfrecords_path: “/tao/tfrecords/*”
image_directory_path: “/tao/data/kitti_face/”
}
image_extension: “jpg”
target_class_mapping {
key: “eye”
value: “eye”
}
validation_fold: 0
}
augmentation_config {
preprocessing {
output_image_width: 1200
output_image_height: 1920
min_bbox_width: 1.0
min_bbox_height: 1.0
output_image_channel: 3
}
spatial_augmentation {
hflip_probability: 0.5
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}
postprocessing_config {
target_class_config {
key: “eye”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00499999988824
dbscan_eps: 0.20000000298
dbscan_min_samples: 1
minimum_bounding_box_height: 20
}
}
}

}
model_config {
#pretrained_model_file: “/workspace/tao-experiments/detectnet_v2/pretrained_resnet18/pretrained_detectnet_v2_vresnet18/resnet18.hdf5”
num_layers: 18
use_batch_norm: true
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
arch: “resnet”
}
evaluation_config {
validation_period_during_training: 10
first_validation_epoch: 30
minimum_detection_ground_truth_overlap {
key: “eye”
value: 0.699999988079
}

evaluation_box_config {
key: “car”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
average_precision_mode: INTEGRATE
}
cost_function_config {
target_classes {
name: “car”
class_weight: 1.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}

enable_autoweighting: false
max_objective_weight: 0.999899983406
min_objective_weight: 9.99999974738e-05
}
training_config {
batch_size_per_gpu: 4
num_epochs: 120

learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-07
max_learning_rate: 5e-05
soft_start: 0.10000000149
annealing: 0.699999988079
}
}
regularizer {
type: L1
weight: 3.00000002618e-09
}
optimizer {
adam {
epsilon: 9.99999993923e-09
beta1: 0.899999976158
beta2: 0.999000012875
}
}
cost_scaling {
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
visualizer{
enabled: true
num_images: 3
scalar_logging_frequency: 50
infrequent_logging_frequency: 5
target_class_config {
key: “eye”
value: {
coverage_threshold: 0.005
}
}

clearml_config{
  project: "TAO Toolkit ClearML Demo"
  task: "detectnet_v2_resnet18_clearml"
  tags: "detectnet_v2"
  tags: "training"
  tags: "resnet18"
  tags: "unpruned"
}
wandb_config{
  project: "TAO Toolkit Wandb Demo"
  name: "detectnet_v2_resnet18_wandb"
  tags: "detectnet_v2"
  tags: "training"
  tags: "resnet18"
  tags: "unpruned"
}

}
checkpoint_interval: 10
}
bbox_rasterizer_config {
target_class_config {
key: “car”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.40000000596
cov_radius_y: 0.40000000596
bbox_min_radius: 1.0
}
}
deadzone_radius: 0.400000154972
}

erence · October 17, 2023, 11:45am

Got it. It was an tfrecord generation error. Corrected it as your given example above. Thanks for your help.

system · November 7, 2023, 1:47am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Detectnet_v2 tlt ( training to detect person) TAO Toolkit	12	703	October 12, 2021
Train tlt detectnet_v2 resnet18 bbox out of range? Deep Learning (Training & Inference)	1	411	July 4, 2020
Problem training resnet10+detectnet_v2 for multiple classes TAO Toolkit	2	729	October 12, 2021
Problem of tao detectnet_v2 evaluate 0% TAO Toolkit python	21	392	July 7, 2023
Help with Detectnet_V2 train config file (TAO) Computer Vision & Image Processing tao	2	834	December 26, 2024
0.0 average precision during a detectnet_v2 training TAO Toolkit	10	491	September 28, 2023
Retraining Model TrafficCamNet TAO Toolkit	5	34	July 30, 2024
Invalid argument: Invalid JPEG data or crop window, data size 786432 TAO Toolkit	9	1361	March 20, 2023
Evaluate Trained models in Tao toolkit TAO Toolkit	37	1335	July 5, 2022
Error while traininig detectnet_v2 with mobilenet_v2 backbone TAO Toolkit	6	633	October 12, 2021