Mix propriertary and public dataset for retrain

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
x64, Ubuntu, RTX3090
• Network Type (Detectnet_v2)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

I have a propriertary dataset which captured from a wide-angle camera looked down in high angle for a small area, trying to detect:

  • Person
  • Electric-Bicycle

The dataset has 2000 (for each) images for Person and Electric-Bicycle, from the testing based on trained model (retrained from detectnetv2 via TAO), it barely can’t seperate Electric-Bicycle and Bicycle.
Question 1:
What can I do here to improve the ability to distinguish Electric-Bicycle and Bicycle?

========================================================
I’m planning to retrain a new model to detect one more class of Bicycle, though my business does not requires at all.
Question 2:
Would this help to improve the ability to distinguish Electric-Bicycle and Bicycle?

========================================================
If answer of Question 2 is yes
I only have little data for Bicycle in my scenario, so the only way is to download Bicycle images from Open Images Dataset. Noticed there’re many Person (walking, riding and etc) in downloaded images, but of course these person are quite different from my target scenario.
Question 3:
Is it necessary to label those Person for downloaded publich dataset? I noticed in testing with original model, the Person from publich dataset still can be partly detected.

For 1), there are two approaches.

  • As you mentioned, to train a new model which will detect one more class of Bicycle.
  • Or keep current model as is, then train a new classification model to classify Electric-Bicycle and Bicycle. The pipeline contains one detection model and one classification model.

For 2), you can try above two approaches. It should help improve the ability to distinguish.
For bicycle dataset, you can search on the web and select one which is similar to your scenario.

For 3), What is the AP result of person class? And how about testing on your test image?

Does the Jetson Nano 2g support this by loading 2 models due its limited hardware spec?

the Person AP trained on mixed dataset is about 85, but from the view of tlt_infer_testing folder by my eye, most of person in public dataset images could not be detected, while the person in propriertary images can be well detected, the reason of the high AP may caused by the val_split set to 5 which is quite small? I just worry if the person annotated in public dataset would lower down the AP in real scenario.

Pruning/retraining for two models are needed. And also experiments are needed to check the fps for whole pipeline.

I am not sure the “real scenario” where you will run your model. Please note that if you use more images of real scenario to run training, the better inference result it will be. Difference dataset have different data distribution. So, the tlt model you have trained against the proprietary images may not work well in other public dataset. You can run “tao detectnet_v2 evaluate” against all of the proprietary images. I think it is similar to AP 85.

thanks Morgan.
I started another round of training with part of data from public Open-Image, these public data only contains class: Bicycle and People, so the summary for mixed dataset of mine is:

  • Total images count
    3296
  • Images source
    2296 from propriertary
    1000 from Open-Image
  • Overall labels distribution
    “electric_bicycle”: 2041,
    “people”: 4439, 30% from Open-Image
    “another_custom_obj”: 2242,
    “bicycle”: 1975, 99% from Open-Image

Training specs
detectnet_v2_tfrecords_kitti_trainval.txt:

TFrecords conversion spec file for kitti training
kitti_config {
root_directory_path: “/workspace/tao-experiments/data/training”
image_dir_name: “image_2”
label_dir_name: “label_2”
image_extension: “.jpg”
partition_mode: “random”
num_partitions: 2
val_split: 8
num_shards: 10
}
image_directory_path: “/workspace/tao-experiments/data/training”

detectnet_v2_train_resnet18_kitti.txt:

random_seed: 42
dataset_config {
data_sources {
tfrecords_path: “/workspace/tao-experiments/data/tfrecords/kitti_trainval/*”
image_directory_path: “/workspace/tao-experiments/data/training”
}
image_extension: “jpg”
target_class_mapping {
key: “another_custom_obj”
value: “another_custom_obj”
}
target_class_mapping {
key: “people”
value: “people”
}
target_class_mapping {
key: “electric_bicycle”
value: “electric_bicycle”
}
target_class_mapping {
key: “bicycle”
value: “bicycle”
}
validation_fold: 0
}
augmentation_config {
preprocessing {
output_image_width: 960
output_image_height: 1280
min_bbox_width: 1.0
min_bbox_height: 1.0
output_image_channel: 3
}
spatial_augmentation {
hflip_probability: 0.5
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}
postprocessing_config {
target_class_config {
key: “another_custom_obj”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00499999988824
dbscan_eps: 0.20000000298
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 10
}
}
}
target_class_config {
key: “people”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00499999988824
dbscan_eps: 0.15000000596
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
target_class_config {
key: “electric_bicycle”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00749999983236
dbscan_eps: 0.230000004172
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
target_class_config {
key: “bicycle”
value {
clustering_config {
clustering_algorithm: DBSCAN
dbscan_confidence_threshold: 0.9
coverage_threshold: 0.00749999983236
dbscan_eps: 0.230000004172
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 20
}
}
}
}
model_config {
pretrained_model_file: “/workspace/tao-experiments/detectnet_v2/pretrained_resnet18/pretrained_detectnet_v2_vresnet18/resnet18.hdf5”
num_layers: 18
use_batch_norm: true
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
arch: “resnet”
}
evaluation_config {
validation_period_during_training: 10
first_validation_epoch: 20
minimum_detection_ground_truth_overlap {
key: “another_custom_obj”
value: 0.4
}
minimum_detection_ground_truth_overlap {
key: “people”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “electric_bicycle”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “bicycle”
value: 0.5
}
evaluation_box_config {
key: “another_custom_obj”
value {
minimum_height: 10
maximum_height: 9999
minimum_width: 14
maximum_width: 9999
}
}
evaluation_box_config {
key: “people”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 20
maximum_width: 9999
}
}
evaluation_box_config {
key: “electric_bicycle”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 20
maximum_width: 9999
}
}
evaluation_box_config {
key: “bicycle”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 20
maximum_width: 9999
}
}
average_precision_mode: INTEGRATE
}
cost_function_config {
target_classes {
name: “another_custom_obj”
class_weight: 10.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
target_classes {
name: “people”
class_weight: 5.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 1.0
}
}
target_classes {
name: “electric_bicycle”
class_weight: 10.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
target_classes {
name: “bicycle”
class_weight: 10.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
enable_autoweighting: true
max_objective_weight: 0.999899983406
min_objective_weight: 9.99999974738e-05
}
training_config {
batch_size_per_gpu: 8
num_epochs: 120
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-06
max_learning_rate: 5e-04
soft_start: 0.10000000149
annealing: 0.699999988079
}
}
regularizer {
type: L1
weight: 3.00000002618e-09
}
optimizer {
adam {
epsilon: 9.99999993923e-09
beta1: 0.899999976158
beta2: 0.999000012875
}
}
cost_scaling {
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
checkpoint_interval: 10
}
bbox_rasterizer_config {
target_class_config {
key: “another_custom_obj”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.40000000596
cov_radius_y: 0.40000000596
bbox_min_radius: 1.0
}
}
target_class_config {
key: “people”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
target_class_config {
key: “electric_bicycle”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
target_class_config {
key: “bicycle”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 1.0
cov_radius_y: 1.0
bbox_min_radius: 1.0
}
}
deadzone_radius: 0.400000154972
}

after the train, the AP is:




2022-03-03 05:41:11,786 [INFO] tensorflow: epoch = 119.96052631578947, learning_rate = 5.025307e-06, loss = 0.0001229787, step = 45585 (5.471 sec)
INFO:tensorflow:Saving checkpoints for step-45600.
2022-03-03 05:41:14,930 [INFO] tensorflow: Saving checkpoints for step-45600.
WARNING:tensorflow:Ignoring: /tmp/tmp15bx64zr; No such file or directory
2022-03-03 05:41:15,051 [WARNING] tensorflow: Ignoring: /tmp/tmp15bx64zr; No such file or directory
2022-03-03 05:41:17,648 [INFO] iva.detectnet_v2.evaluation.evaluation: step 0 / 32, 0.00s/step
2022-03-03 05:41:25,002 [INFO] iva.detectnet_v2.evaluation.evaluation: step 10 / 32, 0.74s/step
2022-03-03 05:41:30,688 [INFO] iva.detectnet_v2.evaluation.evaluation: step 20 / 32, 0.57s/step
2022-03-03 05:41:34,682 [INFO] iva.detectnet_v2.evaluation.evaluation: step 30 / 32, 0.40s/step
Matching predictions to ground truth, class 1/4.: 100%|█| 280/280 [00:00<00:00, 40967.14it/s]
Matching predictions to ground truth, class 2/4.: 100%|█| 4877/4877 [00:00<00:00, 48597.64it/s]
Matching predictions to ground truth, class 3/4.: 100%|█| 384/384 [00:00<00:00, 52199.41it/s]
Matching predictions to ground truth, class 4/4.: 100%|█| 1038/1038 [00:00<00:00, 43879.58it/s]
Epoch 120/120

Validation cost: 0.000135
Mean average_precision (in %): 69.6327

class name average precision (in %)


bicycle 34.853
another_custom_obj 95.5677
electric_bicycle 82.0849
people 66.0253

after the retrain, with same spec (only epoch reduced to 80), the AP:

validation cost: 0.000956
Mean average_precision (in %): 69.2883

class name average precision (in %)


bicycle 33.8405
another_custom_obj 95.7922
electric_bicycle 82.9376
people 64.5828

before this round, I actually did another one with same dataset structure, only bicycle with labels count 500, and that time the AP data is actually much better for all classes:

Epoch 120/120

Validation cost: 0.000101
Mean average_precision (in %): 74.1200

class name average precision (in %)


bicycle 43.7898
another_custom_obj 96.8945
electric_bicycle 80.209
people 75.5866

question 1:
Is it make sense that the more data come worse AP?

question 2:
why the bicycle has so low AP, as I understand the training and validation actually against on public dataset data.

I saw that you set 960x1280. Your training images are all 960x1280, right?

For 2nd figure, it is a bicycle. The bicycle should have white bbox. Why there is red bbox?
Also, the green bbox is not correct, right? There is no person in this green bbox.

yes, because the model can’t seperate bicycle and electric-bicycle

Could you share some training images?

Thanks for the info. Could you run some experiments for detection about two classes (bicycle and electric_bicycle) ? Try resnet18 and resnet50 .

ok, will try today and let you know.

for train for only 2 classes, i just need modify the train and retrain specs, the annotated dataset(contains 4 classes) is no need to remove extra classes, correct?

Yes, only need to modify training spec and run training. Just want to know if these two classes can be distinguished well.

More experiments are needed. Please try to train with TAO classification network to check if these two classes( bicycle and electric-bicycle) can be distinguished well.

  • Prepare dataset: Crop the bicycle or electric-bicycle from your current training images.
  • Create a new folder bicycle. Copy bicycle images into it.
  • Create a new folder electric-bicycle. Copy electric-bicycle images into it.
  • Prepare training spec and train with TAO classification network

the cropping seems not a easy work, may take sometime to prepare.

Does adding more samples in proprietary dataset (contains only electric-bicycle) or publich dataset (contains only bicycle) help?

That can be the next experiment. Firstly we need to check if classification network can distinguish well for these two classes with your current training images.
If classification network will not work well, it will be difficult to get object detection network working well.

You can crop based on the bboxes.
You can modify your comments and delete some private images.

This approach can be an option.

More, as mentioned earlier, please try to run more experiments.
You already get result of

  • detectnet_v2 network + resnet18 backbone
    bicycle 34.8536
    electric_bicycle 81.7761

Please try below.

  • Detectnet_v2 network + resnet50 backbone , train on 2 classes (bicycle and electric_bicycle)
  • Yolov4_tiny network + resnet18 backbone, train on 2 classes (bicycle and electric_bicycle)

I’m on training this, and aware the process would take much longer time(about 8 hours on RTX3090-24G, batch size 16, epoch 80) than training for detectnet_v2 resnet18 on same 2 class dataset, and finnally ran out of GPU memory:

ETA: 4:05 - loss: 159.94282022-03-09 09:07:09,735 [ERROR] iva.common.utils: Ran out of GPU memory, please lower the batch size, use a smaller input resolution, use a smaller backbone, or enable model parallelism for supported TLT architectures (see TLT documentation).

this is the training spec:

random_seed: 42
yolov4_config {
big_anchor_shape: “[(498.00, 489.00), (427.00, 326.00), (311.00, 417.00)]”
mid_anchor_shape: “[(210.00, 257.00), (101.00, 161.00), (60.00, 43.00)]”
box_matching_iou: 0.25
matching_neutral_box_iou: 0.5
arch: “cspdarknet_tiny”
loss_loc_weight: 1.0
loss_neg_obj_weights: 1.0
loss_class_weights: 1.0
label_smoothing: 0.0
big_grid_xy_extend: 0.05
mid_grid_xy_extend: 0.05
freeze_bn: false
#freeze_blocks: 0
force_relu: false
}
training_config {
batch_size_per_gpu: 16
num_epochs: 80
enable_qat: true
checkpoint_interval: 10
learning_rate {
soft_start_cosine_annealing_schedule {
min_learning_rate: 1e-7
max_learning_rate: 1e-4
soft_start: 0.3
}
}
regularizer {
type: L1
weight: 3e-5
}
optimizer {
adam {
epsilon: 1e-7
beta1: 0.9
beta2: 0.999
amsgrad: false
}
}
pretrain_model_path: “/workspace/tao-experiments/yolo_v4_tiny/pretrained_cspdarknet_tiny/pretrained_object_detection_vcspdarknet_tiny/cspdarknet_tiny.hdf5”
}
eval_config {
average_precision_mode: SAMPLE
batch_size: 16
matching_iou_threshold: 0.5
}
nms_config {
confidence_threshold: 0.001
clustering_iou_threshold: 0.5
force_on_cpu: true
top_k: 200
}
augmentation_config {
hue: 0.1
saturation: 1.5
exposure:1.5
vertical_flip:0
horizontal_flip: 0.5
jitter: 0.3
output_width: 960
output_height: 1280
output_channel: 3
randomize_input_shape_period: 10
mosaic_prob: 0.5
mosaic_min_ratio:0.2
}
dataset_config {
data_sources: {
tfrecords_path: “/workspace/tao-experiments/data/training/tfrecords/train*”
image_directory_path: “/workspace/tao-experiments/data/training”
}
include_difficult_in_training: true
image_extension: “png”
target_class_mapping {
key: “electric_bicycle”
value: “electric_bicycle”
}
target_class_mapping {
key: “bicycle”
value: “bicycle”
}
validation_data_sources: {
tfrecords_path: “/workspace/tao-experiments/data/val/tfrecords/val*”
image_directory_path: “/workspace/tao-experiments/data/val”
}
}

dataset statistics:

2022-03-09 06:09:18,812 [INFO] root: Cumulative object statistics
2022-03-09 06:09:18,812 [INFO] root: {
“bicycle”: 1644,
“people_unused”: 3918,
“electric_bicycle”: 1853,
“another_unused_custom_object”: 2034
}
image

is this common?

It is not normal. Did you use kmeans to generate anchor shapes for your labels?

Now I reduced the batch size to 12 as the ran out of GPU memory failed my last try, but seems the memory usage is the same with 16:

GPU memory usage by nvidia-smi: 19173MiB / 24576MiB

yes, the command I used (I treat -x and -y as image’s width and height):

!tao yolo_v4_tiny kmeans -l $DATA_DOWNLOAD_DIR/training/label_2 \
                          -i $DATA_DOWNLOAD_DIR/training/image_2 \
                          -n 6 \
                          -x 960 \
                          -y 1280

the output:

2022-03-09 14:03:40,440 [INFO] root: Registry: [‘nvcr.io’]
2022-03-09 14:03:40,473 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-03-09 14:03:40,487 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/shao/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
Start optimization iteration: 1
Start optimization iteration: 11
Start optimization iteration: 21
Please use following anchor sizes in YOLO config:
(60.00, 43.00)
(101.00, 161.00)
(210.00, 257.00)
(311.00, 417.00)
(427.00, 326.00)
(498.00, 489.00)
2022-03-09 14:03:43,324 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

I then put those 6 tuples into yolo_v4_tiny_train_kitti.txt:


yolov4_config {
big_anchor_shape: “[(498.00, 489.00), (427.00, 326.00), (311.00, 417.00)]”
mid_anchor_shape: “[(210.00, 257.00), (101.00, 161.00), (60.00, 43.00)]”