Mix propriertary and public dataset for retrain

music1913 · March 1, 2022, 2:36am

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
x64, Ubuntu, RTX3090
• Network Type (Detectnet_v2)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

I have a propriertary dataset which captured from a wide-angle camera looked down in high angle for a small area, trying to detect:

Person
Electric-Bicycle

The dataset has 2000 (for each) images for Person and Electric-Bicycle, from the testing based on trained model (retrained from detectnetv2 via TAO), it barely can’t seperate Electric-Bicycle and Bicycle.
Question 1:
What can I do here to improve the ability to distinguish Electric-Bicycle and Bicycle?

========================================================
I’m planning to retrain a new model to detect one more class of Bicycle, though my business does not requires at all.
Question 2:
Would this help to improve the ability to distinguish Electric-Bicycle and Bicycle?

========================================================
If answer of Question 2 is yes
I only have little data for Bicycle in my scenario, so the only way is to download Bicycle images from Open Images Dataset. Noticed there’re many Person (walking, riding and etc) in downloaded images, but of course these person are quite different from my target scenario.
Question 3:
Is it necessary to label those Person for downloaded publich dataset? I noticed in testing with original model, the Person from publich dataset still can be partly detected.

Morganh · March 1, 2022, 4:28pm

For 1), there are two approaches.

As you mentioned, to train a new model which will detect one more class of Bicycle.
Or keep current model as is, then train a new classification model to classify Electric-Bicycle and Bicycle. The pipeline contains one detection model and one classification model.

For 2), you can try above two approaches. It should help improve the ability to distinguish.
For bicycle dataset, you can search on the web and select one which is similar to your scenario.

For 3), What is the AP result of person class? And how about testing on your test image?

music1913 · March 2, 2022, 2:46am

Does the Jetson Nano 2g support this by loading 2 models due its limited hardware spec?

the Person AP trained on mixed dataset is about 85, but from the view of tlt_infer_testing folder by my eye, most of person in public dataset images could not be detected, while the person in propriertary images can be well detected, the reason of the high AP may caused by the val_split set to 5 which is quite small? I just worry if the person annotated in public dataset would lower down the AP in real scenario.

Morganh · March 2, 2022, 3:04am

Pruning/retraining for two models are needed. And also experiments are needed to check the fps for whole pipeline.

I am not sure the “real scenario” where you will run your model. Please note that if you use more images of real scenario to run training, the better inference result it will be. Difference dataset have different data distribution. So, the tlt model you have trained against the proprietary images may not work well in other public dataset. You can run “tao detectnet_v2 evaluate” against all of the proprietary images. I think it is similar to AP 85.

music1913 · March 3, 2022, 6:27am

thanks Morgan.
I started another round of training with part of data from public Open-Image, these public data only contains class: Bicycle and People, so the summary for mixed dataset of mine is:

Total images count
3296
Images source
2296 from propriertary
1000 from Open-Image
Overall labels distribution
“electric_bicycle”: 2041,
“people”: 4439, 30% from Open-Image
“another_custom_obj”: 2242,
“bicycle”: 1975, 99% from Open-Image

Training specs
detectnet_v2_tfrecords_kitti_trainval.txt:

TFrecords conversion spec file for kitti training
kitti_config {
root_directory_path: “/workspace/tao-experiments/data/training”
image_dir_name: “image_2”
label_dir_name: “label_2”
image_extension: “.jpg”
partition_mode: “random”
num_partitions: 2
val_split: 8
num_shards: 10
}
image_directory_path: “/workspace/tao-experiments/data/training”

detectnet_v2_train_resnet18_kitti.txt:

random_seed: 42
dataset_config {
data_sources {
tfrecords_path: “/workspace/tao-experiments/data/tfrecords/kitti_trainval/*”
image_directory_path: “/workspace/tao-experiments/data/training”
}
image_extension: “jpg”
target_class_mapping {
key: “another_custom_obj”
value: “another_custom_obj”
}
target_class_mapping {
key: “people”
value: “people”
}
target_class_mapping {
key: “electric_bicycle”
value: “electric_bicycle”
}
target_class_mapping {
key: “bicycle”
value: “bicycle”
}
validation_fold: 0
}
augmentation_config {
preprocessing {
output_image_width: 960
output_image_height: 1280
min_bbox_width: 1.0
min_bbox_height: 1.0
output_image_channel: 3
}
spatial_augmentation {
hflip_probability: 0.5
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}
postprocessing_config {
target_class_config {
key: “another_custom_obj”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00499999988824
dbscan_eps: 0.20000000298
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 10
}
}
}
target_class_config {
key: “people”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00499999988824
dbscan_eps: 0.15000000596
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
target_class_config {
key: “electric_bicycle”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00749999983236
dbscan_eps: 0.230000004172
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
target_class_config {
key: “bicycle”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00749999983236
dbscan_eps: 0.230000004172
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
}
model_config {
pretrained_model_file: “/workspace/tao-experiments/detectnet_v2/pretrained_resnet18/pretrained_detectnet_v2_vresnet18/resnet18.hdf5”
num_layers: 18
use_batch_norm: true
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
arch: “resnet”
}
evaluation_config {
validation_period_during_training: 10
first_validation_epoch: 20
minimum_detection_ground_truth_overlap {
key: “another_custom_obj”
value: 0.4
}
minimum_detection_ground_truth_overlap {
key: “people”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “electric_bicycle”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “bicycle”
value: 0.5
}
evaluation_box_config {
key: “another_custom_obj”
value {
minimum_height: 10
maximum_height: 9999
minimum_width: 14
maximum_width: 9999
}
}
evaluation_box_config {
key: “people”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 20
maximum_width: 9999
}
}
evaluation_box_config {
key: “electric_bicycle”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 20
maximum_width: 9999
}
}
evaluation_box_config {
key: “bicycle”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 20
maximum_width: 9999
}
}
average_precision_mode: INTEGRATE
}
cost_function_config {
target_classes {
name: “another_custom_obj”
class_weight: 10.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
target_classes {
name: “people”
class_weight: 5.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 1.0
}
}
target_classes {
name: “electric_bicycle”
class_weight: 10.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
target_classes {
name: “bicycle”
class_weight: 10.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
enable_autoweighting: true
max_objective_weight: 0.999899983406
min_objective_weight: 9.99999974738e-05
}
training_config {
batch_size_per_gpu: 8
num_epochs: 120
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-06
max_learning_rate: 5e-04
soft_start: 0.10000000149
annealing: 0.699999988079
}
}
regularizer {
type: L1
weight: 3.00000002618e-09
}
optimizer {
adam {
epsilon: 9.99999993923e-09
beta1: 0.899999976158
beta2: 0.999000012875
}
}
cost_scaling {
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
checkpoint_interval: 10
}
bbox_rasterizer_config {
target_class_config {
key: “another_custom_obj”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.40000000596
cov_radius_y: 0.40000000596
bbox_min_radius: 1.0
}
}
target_class_config {
key: “people”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
target_class_config {
key: “electric_bicycle”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
target_class_config {
key: “bicycle”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
deadzone_radius: 0.400000154972
}

after the train, the AP is:

…
…
…
2022-03-03 05:41:11,786 [INFO] tensorflow: epoch = 119.96052631578947, learning_rate = 5.025307e-06, loss = 0.0001229787, step = 45585 (5.471 sec)
INFO:tensorflow:Saving checkpoints for step-45600.
2022-03-03 05:41:14,930 [INFO] tensorflow: Saving checkpoints for step-45600.
WARNING:tensorflow:Ignoring: /tmp/tmp15bx64zr; No such file or directory
2022-03-03 05:41:15,051 [WARNING] tensorflow: Ignoring: /tmp/tmp15bx64zr; No such file or directory
2022-03-03 05:41:17,648 [INFO] iva.detectnet_v2.evaluation.evaluation: step 0 / 32, 0.00s/step
2022-03-03 05:41:25,002 [INFO] iva.detectnet_v2.evaluation.evaluation: step 10 / 32, 0.74s/step
2022-03-03 05:41:30,688 [INFO] iva.detectnet_v2.evaluation.evaluation: step 20 / 32, 0.57s/step
2022-03-03 05:41:34,682 [INFO] iva.detectnet_v2.evaluation.evaluation: step 30 / 32, 0.40s/step
Matching predictions to ground truth, class 1/4.: 100%|█| 280/280 [00:00<00:00, 40967.14it/s]
Matching predictions to ground truth, class 2/4.: 100%|█| 4877/4877 [00:00<00:00, 48597.64it/s]
Matching predictions to ground truth, class 3/4.: 100%|█| 384/384 [00:00<00:00, 52199.41it/s]
Matching predictions to ground truth, class 4/4.: 100%|█| 1038/1038 [00:00<00:00, 43879.58it/s]
Epoch 120/120

Validation cost: 0.000135
Mean average_precision (in %): 69.6327

class name average precision (in %)

bicycle 34.853
another_custom_obj 95.5677
electric_bicycle 82.0849
people 66.0253

after the retrain, with same spec (only epoch reduced to 80), the AP:

validation cost: 0.000956
Mean average_precision (in %): 69.2883

class name average precision (in %)

bicycle 33.8405
another_custom_obj 95.7922
electric_bicycle 82.9376
people 64.5828

before this round, I actually did another one with same dataset structure, only bicycle with labels count 500, and that time the AP data is actually much better for all classes:

Epoch 120/120

Validation cost: 0.000101
Mean average_precision (in %): 74.1200

class name average precision (in %)

bicycle 43.7898
another_custom_obj 96.8945
electric_bicycle 80.209
people 75.5866

question 1:
Is it make sense that the more data come worse AP?

question 2:
why the bicycle has so low AP, as I understand the training and validation actually against on public dataset data.

Morganh · March 3, 2022, 4:24pm

I saw that you set 960x1280. Your training images are all 960x1280, right?

Morganh · March 4, 2022, 1:19pm

For 2nd figure, it is a bicycle. The bicycle should have white bbox. Why there is red bbox?
Also, the green bbox is not correct, right? There is no person in this green bbox.

music1913 · March 4, 2022, 1:55pm

yes, because the model can’t seperate bicycle and electric-bicycle

Morganh · March 4, 2022, 2:39pm

Could you share some training images?

Morganh · March 6, 2022, 4:22pm

Thanks for the info. Could you run some experiments for detection about two classes (bicycle and electric_bicycle) ? Try resnet18 and resnet50 .

music1913 · March 6, 2022, 11:53pm

ok, will try today and let you know.

for train for only 2 classes, i just need modify the train and retrain specs, the annotated dataset(contains 4 classes) is no need to remove extra classes, correct?

Morganh · March 7, 2022, 1:03am

Yes, only need to modify training spec and run training. Just want to know if these two classes can be distinguished well.

Morganh · March 7, 2022, 7:05am

More experiments are needed. Please try to train with TAO classification network to check if these two classes( bicycle and electric-bicycle) can be distinguished well.

Prepare dataset: Crop the bicycle or electric-bicycle from your current training images.
Create a new folder bicycle. Copy bicycle images into it.
Create a new folder electric-bicycle. Copy electric-bicycle images into it.
Prepare training spec and train with TAO classification network

music1913 · March 7, 2022, 7:22am

the cropping seems not a easy work, may take sometime to prepare.

Does adding more samples in proprietary dataset (contains only electric-bicycle) or publich dataset (contains only bicycle) help?

Morganh · March 7, 2022, 7:26am

That can be the next experiment. Firstly we need to check if classification network can distinguish well for these two classes with your current training images.
If classification network will not work well, it will be difficult to get object detection network working well.

Morganh · March 7, 2022, 7:34am

You can crop based on the bboxes.
You can modify your comments and delete some private images.

Morganh · March 8, 2022, 10:02am

This approach can be an option.

More, as mentioned earlier, please try to run more experiments.
You already get result of

detectnet_v2 network + resnet18 backbone
bicycle 34.8536
electric_bicycle 81.7761

Please try below.

Detectnet_v2 network + resnet50 backbone , train on 2 classes (bicycle and electric_bicycle)
Yolov4_tiny network + resnet18 backbone, train on 2 classes (bicycle and electric_bicycle)

music1913 · March 9, 2022, 8:38am

I’m on training this, and aware the process would take much longer time(about 8 hours on RTX3090-24G, batch size 16, epoch 80) than training for detectnet_v2 resnet18 on same 2 class dataset, and finnally ran out of GPU memory:

ETA: 4:05 - loss: 159.94282022-03-09 09:07:09,735 [ERROR] iva.common.utils: Ran out of GPU memory, please lower the batch size, use a smaller input resolution, use a smaller backbone, or enable model parallelism for supported TLT architectures (see TLT documentation).

this is the training spec:

random_seed: 42
yolov4_config {
big_anchor_shape: “[(498.00, 489.00), (427.00, 326.00), (311.00, 417.00)]”
mid_anchor_shape: “[(210.00, 257.00), (101.00, 161.00), (60.00, 43.00)]”
box_matching_iou: 0.25
matching_neutral_box_iou: 0.5
arch: “cspdarknet_tiny”
loss_loc_weight: 1.0
loss_neg_obj_weights: 1.0
loss_class_weights: 1.0
label_smoothing: 0.0
big_grid_xy_extend: 0.05
mid_grid_xy_extend: 0.05
freeze_bn: false
#freeze_blocks: 0
force_relu: false
}
training_config {
batch_size_per_gpu: 16
num_epochs: 80
enable_qat: true
checkpoint_interval: 10
learning_rate {
soft_start_cosine_annealing_schedule {
min_learning_rate: 1e-7
max_learning_rate: 1e-4
soft_start: 0.3
}
}
regularizer {
type: L1
weight: 3e-5
}
optimizer {
adam {
epsilon: 1e-7
beta1: 0.9
beta2: 0.999
amsgrad: false
}
}
pretrain_model_path: “/workspace/tao-experiments/yolo_v4_tiny/pretrained_cspdarknet_tiny/pretrained_object_detection_vcspdarknet_tiny/cspdarknet_tiny.hdf5”
}
eval_config {
average_precision_mode: SAMPLE
batch_size: 16
matching_iou_threshold: 0.5
}
nms_config {
confidence_threshold: 0.001
clustering_iou_threshold: 0.5
force_on_cpu: true
top_k: 200
}
augmentation_config {
hue: 0.1
saturation: 1.5
exposure:1.5
vertical_flip:0
horizontal_flip: 0.5
jitter: 0.3
output_width: 960
output_height: 1280
output_channel: 3
randomize_input_shape_period: 10
mosaic_prob: 0.5
mosaic_min_ratio:0.2
}
dataset_config {
data_sources: {
tfrecords_path: “/workspace/tao-experiments/data/training/tfrecords/train*”
image_directory_path: “/workspace/tao-experiments/data/training”
}
include_difficult_in_training: true
image_extension: “png”
target_class_mapping {
key: “electric_bicycle”
value: “electric_bicycle”
}
target_class_mapping {
key: “bicycle”
value: “bicycle”
}
validation_data_sources: {
tfrecords_path: “/workspace/tao-experiments/data/val/tfrecords/val*”
image_directory_path: “/workspace/tao-experiments/data/val”
}
}

dataset statistics:

2022-03-09 06:09:18,812 [INFO] root: Cumulative object statistics
2022-03-09 06:09:18,812 [INFO] root: {
“bicycle”: 1644,
“people_unused”: 3918,
“electric_bicycle”: 1853,
“another_unused_custom_object”: 2034
}

is this common?

Morganh · March 9, 2022, 9:07am

It is not normal. Did you use kmeans to generate anchor shapes for your labels?

music1913 · March 9, 2022, 9:15am

Now I reduced the batch size to 12 as the ran out of GPU memory failed my last try, but seems the memory usage is the same with 16:

GPU memory usage by nvidia-smi: 19173MiB / 24576MiB

yes, the command I used (I treat -x and -y as image’s width and height):

!tao yolo_v4_tiny kmeans -l $DATA_DOWNLOAD_DIR/training/label_2 \
                          -i $DATA_DOWNLOAD_DIR/training/image_2 \
                          -n 6 \
                          -x 960 \
                          -y 1280

the output:

2022-03-09 14:03:40,440 [INFO] root: Registry: [‘nvcr.io’]
2022-03-09 14:03:40,473 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-03-09 14:03:40,487 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/shao/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
Start optimization iteration: 1
Start optimization iteration: 11
Start optimization iteration: 21
Please use following anchor sizes in YOLO config:
(60.00, 43.00)
(101.00, 161.00)
(210.00, 257.00)
(311.00, 417.00)
(427.00, 326.00)
(498.00, 489.00)
2022-03-09 14:03:43,324 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

I then put those 6 tuples into yolo_v4_tiny_train_kitti.txt:

…
yolov4_config {
big_anchor_shape: “[(498.00, 489.00), (427.00, 326.00), (311.00, 417.00)]”
mid_anchor_shape: “[(210.00, 257.00), (101.00, 161.00), (60.00, 43.00)]”
…
…

Topic		Replies	Views
High ram usage with tlt ResNet TAO Toolkit	42	997	July 6, 2022
Too many false positives. TAO Toolkit	33	2315	October 12, 2021
Tao toolkit observations TAO Toolkit	56	964	May 29, 2024
Evaluate Trained models in Tao toolkit TAO Toolkit	37	1340	July 5, 2022
Probelm as running visual_changenet_classification on TAO launcher TAO Toolkit	41	1033	November 21, 2023
TAO Toolkit Training Error TAO Toolkit	2	710	August 2, 2022
Detectnet_v2 acuity is low TAO Toolkit	19	340	July 18, 2023
BodyPoseNet trained with custom dataset not detecting TAO Toolkit	21	851	June 6, 2022
Tao pre-trained yolo4tiny - AssertionError: Must have more boxes than clusters TAO Toolkit	54	2280	January 21, 2022
Excessive Detections (False Positives) with TAO Model (BatchedNMS) after DeepStream 6.4 to 7.0 Migration DeepStream SDK deepstream	16	62	April 15, 2025

Mix propriertary and public dataset for retrain

Related topics