Relationship between training dataset size and inference data size

music1913 · January 26, 2022, 3:27am

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
Ubuntu, x86, RTX3090
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
Detectnet_v2
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

I’m using TAO to retraining my custom model based on detectnet-v2 (resnet18).
Context 1:
My private dataset original images are varies, different ratio and resolution, I resized them all to resolution 800x608 by a image resize tool for compatible with the requirement of training in TAO.
Question 1:
the image resize tool are ratio based not crop, thus an image (or objects in it) could be distorted, does this impact the later inference? or I mis-understanding anything?

Context 2:
After export the model for start inference in deepstream6, I prepared a 1920x1080 video file in local, also noticed there’s an parameter: input-dims (channel; height; width; input-order All integers, ≥0) in pgie-config-file, I put value 3;1080;1920;0 and ran the app, by my eye, can see the accuracy is pretty bad as many False positive bounding boxes(the box report target object in a actual empty area) were showing, but if I change the value to 3;608;800;0, then the accuracy is much better.
Question 2:
What and when I should change the value for parameter: input-dims as the inference source resolution could be varies(from different camera)?

Question 3:
I even noticed, for a same inference source(like a rtsp stream), I keep the ratio but input different width and height with scale into input-dims, also can cause huge different detection accuracy.

Morganh · January 26, 2022, 7:39am

This may affect the training result and inference. How about the mAP after training?

Can you share your training spec? What is the model width and height? The input dims depends on it.

The input-dims should not change.

music1913 · January 26, 2022, 7:59am

The varies resolution of original images are all resized (keep ratio) to 800x608 firstly and then put into image_2 and label_2, and the traning validation will be against on these resized images as well, correct? and I can see the mAP is good both in training and re-training(pruned) stage, below is the training mAP:

Validation cost: 0.000132
Mean average_precision (in %): 85.1225

class name average precision (in %)

door_warning_sign 80.8844
electric_bicycle 80.9867
people 93.4964

detectnet_v2_tfrecords_kitti_trainval.txt：

TFrecords conversion spec file for kitti training
kitti_config {
root_directory_path: “/workspace/tao-experiments/data/training”
image_dir_name: “image_2”
label_dir_name: “label_2”
image_extension: “.jpg”
partition_mode: “random”
num_partitions: 2
val_split: 6
num_shards: 10
}

detectnet_v2_train_resnet18_kitti.txt:

random_seed: 42
dataset_config {
data_sources {
tfrecords_path: “/workspace/tao-experiments/data/tfrecords/kitti_trainval/*”
image_directory_path: “/workspace/tao-experiments/data/training”
}
image_extension: “jpg”
target_class_mapping {
key: “door_warning_sign”
value: “door_warning_sign”
}
target_class_mapping {
key: “people”
value: “people”
}
target_class_mapping {
key: “electric_bicycle”
value: “electric_bicycle”
}
validation_fold: 0
}
augmentation_config {
preprocessing {
output_image_width: 800
output_image_height: 608
min_bbox_width: 1.0
min_bbox_height: 1.0
output_image_channel: 3
}
spatial_augmentation {
hflip_probability: 0.5
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}
postprocessing_config {
target_class_config {
key: “door_warning_sign”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00499999988824
dbscan_eps: 0.20000000298
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 10
}
}
}
target_class_config {
key: “people”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00499999988824
dbscan_eps: 0.15000000596
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
target_class_config {
key: “electric_bicycle”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00749999983236
dbscan_eps: 0.230000004172
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
}
model_config {
pretrained_model_file: “/workspace/tao-experiments/detectnet_v2/pretrained_resnet18/pretrained_detectnet_v2_vresnet18/resnet18.hdf5”
num_layers: 18
use_batch_norm: true
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
arch: “resnet”
}
evaluation_config {
validation_period_during_training: 10
first_validation_epoch: 20
minimum_detection_ground_truth_overlap {
key: “door_warning_sign”
value: 0.4
}
minimum_detection_ground_truth_overlap {
key: “people”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “electric_bicycle”
value: 0.5
}
evaluation_box_config {
key: “door_warning_sign”
value {
minimum_height: 10
maximum_height: 9999
minimum_width: 14
maximum_width: 9999
}
}
evaluation_box_config {
key: “people”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
evaluation_box_config {
key: “electric_bicycle”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
average_precision_mode: INTEGRATE
}
cost_function_config {
target_classes {
name: “door_warning_sign”
class_weight: 10.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
target_classes {
name: “people”
class_weight: 5.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 1.0
}
}
target_classes {
name: “electric_bicycle”
class_weight: 5.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
enable_autoweighting: true
max_objective_weight: 0.999899983406
min_objective_weight: 9.99999974738e-05
}
training_config {
batch_size_per_gpu: 4
num_epochs: 80
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-06
max_learning_rate: 5e-04
soft_start: 0.10000000149
annealing: 0.699999988079
}
}
regularizer {
type: L1
weight: 3.00000002618e-09
}
optimizer {
adam {
epsilon: 9.99999993923e-09
beta1: 0.899999976158
beta2: 0.999000012875
}
}
cost_scaling {
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
checkpoint_interval: 10
}
bbox_rasterizer_config {
target_class_config {
key: “door_warning_sign”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.40000000596
cov_radius_y: 0.40000000596
bbox_min_radius: 1.0
}
}
target_class_config {
key: “people”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
target_class_config {
key: “electric_bicycle”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
deadzone_radius: 0.400000154972
}

since the training and validation are all based on ratio resized images, does this mean the model may learned the distorted objects, correct?

for my scenario, when do the inference, the video sources may have different resolutions, here by hand, a camera is with resolution 1280x960, what is the recommend input-dims values?

Morganh · January 26, 2022, 8:08am

Yes.

Just set 3;608;800;0

music1913 · January 26, 2022, 8:22am

The target objects ratio in training dataset should keep as much as possible to the inference video source, correct?

which way does the input-dims do the resize? keep ratio or crop(padding)? this should align with training dataset resize algorithm, correct?

Morganh · January 26, 2022, 8:26am

I think you are running deepstream. So, need not to care the resizing for the test video.

music1913 · January 26, 2022, 8:27am

but different values in input-dims greatly impact the detection result, I’m testing with my own video.

Morganh · January 26, 2022, 8:32am

The input-dims cannot be changed. It is related to your model.
Can only set to 3;608;800;0 according to your training spec.

music1913 · January 27, 2022, 2:50am

thanks Morgan.

Since my inference camera video source resolution is a fix value (now is 1280x960), does this imply I can adjust my training dataset with all resize to 1280x960 as well, and could be helpful for improve inference detection accuracy?

Morganh · January 27, 2022, 2:56am

Yes, you can train a new model.
Just enable_auto _resize and change output_image_width and output_image_height in the training spec.

enable_auto_resize: true
output_image_width: 1280
output_image_height: 960

Refer to DetectNet_v2 — TAO Toolkit 3.22.05 documentation

music1913 · January 27, 2022, 3:08am

thansk morgan,
the key point is the training dataset should keep the image size the same as inference source as much as possible, correct?

Morganh · January 27, 2022, 3:10am

Usually it will be better to run inference against test dataset which is similar to training dataset.

system · February 22, 2022, 1:29am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Error detectnet_V2 train with TAO : dbscan_min_samples: 0.05' TAO Toolkit tao	4	388	November 7, 2023
Mix propriertary and public dataset for retrain TAO Toolkit	34	1153	March 10, 2022
Jetson Nano2g run DS6 inference from a Live Camera RTSP stream with 6 FPS TAO Toolkit	7	610	February 8, 2022
Too many false positives. TAO Toolkit	33	2315	October 12, 2021
Detectnet_v2(resnet50) low accuracy on 2 class dataset TAO Toolkit	25	920	February 12, 2023
Mean average precision too low on dimension (640*480) with (detectnetv2+Resnet18)? TAO Toolkit	2	728	October 12, 2021
0.0 average precision during a detectnet_v2 training TAO Toolkit	10	496	September 28, 2023
Detectnet_v2 acuity is low TAO Toolkit	19	340	July 18, 2023
Using detectnet_v2 pretrained models in TLT v3.0 TAO Toolkit	11	920	October 27, 2021
Training Custom Object detector with 6 classes TAO Toolkit	27	2196	October 12, 2021

Relationship between training dataset size and inference data size

Related topics