Deepstream_lpr_app runs slowly

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
Xavier
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
LPDnet,LPRnet
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
dockers:
nvidia/tao/tao-toolkit-tf:
docker_registry: nvcr.io
docker_tag: v3.21.08-py3
tasks:
1. augment
2. bpnet
3. classification
4. detectnet_v2
5. dssd
6. emotionnet
7. faster_rcnn
8. fpenet
9. gazenet
10. gesturenet
11. heartratenet
12. lprnet
13. mask_rcnn
14. multitask_classification
15. retinanet
16. ssd
17. unet
18. yolo_v3
19. yolo_v4
20. converter
nvidia/tao/tao-toolkit-pyt:
docker_registry: nvcr.io
docker_tag: v3.21.08-py3
tasks:
1. speech_to_text
2. speech_to_text_citrinet
3. text_classification
4. question_answering
5. token_classification
6. intent_slot_classification
7. punctuation_and_capitalization
nvidia/tao/tao-toolkit-lm:
docker_registry: nvcr.io
docker_tag: v3.21.08-py3
tasks:
1. n_gram
format_version: 1.0
toolkit_version: 3.21.08
published_date: 08/17/2021

• Training spec file(If have, please share here)
#LPDnet spec file
random_seed: 42
dataset_config {
data_sources {
tfrecords_path: “/workspace/openalpr/lpd_tfrecord/*”
image_directory_path: “/workspace/openalpr/data/”
}
image_extension: “jpg”
target_class_mapping {
key: “lpd”
value: “lpd”
}
validation_fold: 0
}
augmentation_config {
preprocessing {
output_image_width: 720
output_image_height: 1168
min_bbox_width: 1.0
min_bbox_height: 1.0
output_image_channel: 3
}
spatial_augmentation {
hflip_probability: 0.5
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}
postprocessing_config {
target_class_config {
key: “lpd”
value {
clustering_config {
coverage_threshold: 0.00499999988824
dbscan_eps: 0.20000000298
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 4
}
}
}
}
model_config {
pretrained_model_file: “/workspace/openalpr/ccpd_unpruned_1.tlt”
num_layers: 18
use_batch_norm: true
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
training_precision {
backend_floatx: FLOAT32
}
arch: “resnet”
}
evaluation_config {
validation_period_during_training: 20
first_validation_epoch: 1
minimum_detection_ground_truth_overlap {
key: “lpd”
value: 0.699999988079
}
evaluation_box_config {
key: “lpd”
value {
minimum_height: 10
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
average_precision_mode: INTEGRATE
}
cost_function_config {
target_classes {
name: “lpd”
class_weight: 1.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
enable_autoweighting: true
max_objective_weight: 0.999899983406
min_objective_weight: 9.99999974738e-05
}
training_config {
batch_size_per_gpu: 16
num_epochs: 600
enable_qat: False
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-06
max_learning_rate: 5e-04
soft_start: 0.10000000149
annealing: 0.699999988079
}
}
regularizer {
type: L1
weight: 3.00000002618e-09
}
optimizer {
adam {
epsilon: 9.99999993923e-09
beta1: 0.899999976158
beta2: 0.999000012875
}
}
cost_scaling {
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
checkpoint_interval: 10
}
bbox_rasterizer_config {
target_class_config {
key: “lpd”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.40000000596
cov_radius_y: 0.40000000596
bbox_min_radius: 1.0
}
}
deadzone_radius: 0.400000154972
}

I trained LPDnet and LPRnet using a custom data set, and successfully ran deepstream_lpr_app on Xavier NX, but his speed was very slow. Its average fps is only 7, referring to deepstream_lpr_app, the theoretical speed can reach 3 streams with a total fps of 80.31. Any ideas? thanks.

The fps result mentioned in deepstream_lpr_app is tested with usa_pruned lpd model in the pipeline . Its resolution is 640x480. And it is a pruned model which has smaller size.
Actually in the pipeline for you to detect Chinese license plate, you can also just replace with the lpr model. Keep tracficcament model and usa_pruned lpd model as is.

Hi,Morganh:
In my test, the accuracy of the LPDnet model is not good, so I retrained unpruned.tlt with a custom data set, and the model obtained is very large, nearly 10 times that of pruned.tlt. Refer to lpr-app,It does not introduce the method of prune model. I tried to use tao detectnet_v2 prune, but an error occurred. Any ideas?

LPDnet model is actually based on detectnet_v2 network.
So, please use it to prune and run retraining.

The following error occurred:

root@IDC_GPU_Server-1:/home/sutpc/xiukd/zjq/download/tlt-experiments/LPDR/lpd# tao detectnet_v2 prune -m ./ccpd_unpruned_1.tlt -o ./ccpd_unpruned_1_prune.tlt -pth 0.3 -k nvidia_tlt
2021-11-10 14:55:26,595 [INFO] root: Registry: ['nvcr.io']
2021-11-10 14:55:27,403 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/root/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/prune.py", line 16, in <module>
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_prune.py", line 185, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_prune.py", line 120, in run_pruning
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py", line 282, in decode_to_keras
ValueError: Cannot find input file name.
2021-11-10 14:55:38,354 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

All the file path should be the path inside the docker.
Please check your ~/.tao_mounts.json file how did you map your local directories to the docker.

I used tao detectnet_v2 prune to compress LPDnet, but when I used tao detectnet_v2 export to convert tlt to etlt, the file size returned to its original size, any ideas?
image

When you run pruning, can you check in the log about the pruning ratio?
More, after pruning, please run retraining. And you can also check the trainable parameters in the retraining log.

The pruning ratio is very low, as shown below:


When I retrain the pruned model, its file size is the same as before pruned,the trainable parameters are as follows:

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_1 (InputLayer)            (None, 3, 1168, 720) 0
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 64, 584, 360) 9472        input_1[0][0]
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization)   (None, 64, 584, 360) 256         conv1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 64, 584, 360) 0           bn_conv1[0][0]
__________________________________________________________________________________________________
block_1a_conv_1 (Conv2D)        (None, 64, 292, 180) 36928       activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_1 (BatchNormalizati (None, 64, 292, 180) 256         block_1a_conv_1[0][0]
__________________________________________________________________________________________________
block_1a_relu_1 (Activation)    (None, 64, 292, 180) 0           block_1a_bn_1[0][0]
__________________________________________________________________________________________________
block_1a_conv_2 (Conv2D)        (None, 64, 292, 180) 36928       block_1a_relu_1[0][0]
__________________________________________________________________________________________________
block_1a_conv_shortcut (Conv2D) (None, 64, 292, 180) 4160        activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_2 (BatchNormalizati (None, 64, 292, 180) 256         block_1a_conv_2[0][0]
__________________________________________________________________________________________________
block_1a_bn_shortcut (BatchNorm (None, 64, 292, 180) 256         block_1a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_1 (Add)                     (None, 64, 292, 180) 0           block_1a_bn_2[0][0]
                                                                 block_1a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_1a_relu (Activation)      (None, 64, 292, 180) 0           add_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_1 (Conv2D)        (None, 64, 292, 180) 36928       block_1a_relu[0][0]
__________________________________________________________________________________________________
block_1b_bn_1 (BatchNormalizati (None, 64, 292, 180) 256         block_1b_conv_1[0][0]
__________________________________________________________________________________________________
block_1b_relu_1 (Activation)    (None, 64, 292, 180) 0           block_1b_bn_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_2 (Conv2D)        (None, 64, 292, 180) 36928       block_1b_relu_1[0][0]
__________________________________________________________________________________________________
block_1b_bn_2 (BatchNormalizati (None, 64, 292, 180) 256         block_1b_conv_2[0][0]
__________________________________________________________________________________________________
add_2 (Add)                     (None, 64, 292, 180) 0           block_1b_bn_2[0][0]
                                                                 block_1a_relu[0][0]
__________________________________________________________________________________________________
block_1b_relu (Activation)      (None, 64, 292, 180) 0           add_2[0][0]
__________________________________________________________________________________________________
block_2a_conv_1 (Conv2D)        (None, 128, 146, 90) 73856       block_1b_relu[0][0]
__________________________________________________________________________________________________
block_2a_bn_1 (BatchNormalizati (None, 128, 146, 90) 512         block_2a_conv_1[0][0]
__________________________________________________________________________________________________
block_2a_relu_1 (Activation)    (None, 128, 146, 90) 0           block_2a_bn_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_2 (Conv2D)        (None, 128, 146, 90) 147584      block_2a_relu_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_shortcut (Conv2D) (None, 128, 146, 90) 8320        block_1b_relu[0][0]
__________________________________________________________________________________________________
block_2a_bn_2 (BatchNormalizati (None, 128, 146, 90) 512         block_2a_conv_2[0][0]
__________________________________________________________________________________________________
block_2a_bn_shortcut (BatchNorm (None, 128, 146, 90) 512         block_2a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_3 (Add)                     (None, 128, 146, 90) 0           block_2a_bn_2[0][0]
                                                                 block_2a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_2a_relu (Activation)      (None, 128, 146, 90) 0           add_3[0][0]
__________________________________________________________________________________________________
block_2b_conv_1 (Conv2D)        (None, 128, 146, 90) 147584      block_2a_relu[0][0]
__________________________________________________________________________________________________
block_2b_bn_1 (BatchNormalizati (None, 128, 146, 90) 512         block_2b_conv_1[0][0]
__________________________________________________________________________________________________
block_2b_relu_1 (Activation)    (None, 128, 146, 90) 0           block_2b_bn_1[0][0]
__________________________________________________________________________________________________
block_2b_conv_2 (Conv2D)        (None, 128, 146, 90) 147584      block_2b_relu_1[0][0]
__________________________________________________________________________________________________
block_2b_bn_2 (BatchNormalizati (None, 128, 146, 90) 512         block_2b_conv_2[0][0]
__________________________________________________________________________________________________
add_4 (Add)                     (None, 128, 146, 90) 0           block_2b_bn_2[0][0]
                                                                 block_2a_relu[0][0]
__________________________________________________________________________________________________
block_2b_relu (Activation)      (None, 128, 146, 90) 0           add_4[0][0]
__________________________________________________________________________________________________
block_3a_conv_1 (Conv2D)        (None, 256, 73, 45)  295168      block_2b_relu[0][0]
__________________________________________________________________________________________________
block_3a_bn_1 (BatchNormalizati (None, 256, 73, 45)  1024        block_3a_conv_1[0][0]
__________________________________________________________________________________________________
block_3a_relu_1 (Activation)    (None, 256, 73, 45)  0           block_3a_bn_1[0][0]
__________________________________________________________________________________________________
block_3a_conv_2 (Conv2D)        (None, 256, 73, 45)  590080      block_3a_relu_1[0][0]
__________________________________________________________________________________________________
block_3a_conv_shortcut (Conv2D) (None, 256, 73, 45)  33024       block_2b_relu[0][0]
__________________________________________________________________________________________________
block_3a_bn_2 (BatchNormalizati (None, 256, 73, 45)  1024        block_3a_conv_2[0][0]
__________________________________________________________________________________________________
block_3a_bn_shortcut (BatchNorm (None, 256, 73, 45)  1024        block_3a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_5 (Add)                     (None, 256, 73, 45)  0           block_3a_bn_2[0][0]
                                                                 block_3a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_3a_relu (Activation)      (None, 256, 73, 45)  0           add_5[0][0]
__________________________________________________________________________________________________
block_3b_conv_1 (Conv2D)        (None, 256, 73, 45)  590080      block_3a_relu[0][0]
__________________________________________________________________________________________________
block_3b_bn_1 (BatchNormalizati (None, 256, 73, 45)  1024        block_3b_conv_1[0][0]
__________________________________________________________________________________________________
block_3b_relu_1 (Activation)    (None, 256, 73, 45)  0           block_3b_bn_1[0][0]
__________________________________________________________________________________________________
block_3b_conv_2 (Conv2D)        (None, 256, 73, 45)  590080      block_3b_relu_1[0][0]
__________________________________________________________________________________________________
block_3b_bn_2 (BatchNormalizati (None, 256, 73, 45)  1024        block_3b_conv_2[0][0]
__________________________________________________________________________________________________
add_6 (Add)                     (None, 256, 73, 45)  0           block_3b_bn_2[0][0]
                                                                 block_3a_relu[0][0]
__________________________________________________________________________________________________
block_3b_relu (Activation)      (None, 256, 73, 45)  0           add_6[0][0]
__________________________________________________________________________________________________
block_4a_conv_1 (Conv2D)        (None, 512, 73, 45)  1180160     block_3b_relu[0][0]
__________________________________________________________________________________________________
block_4a_bn_1 (BatchNormalizati (None, 512, 73, 45)  2048        block_4a_conv_1[0][0]
__________________________________________________________________________________________________
block_4a_relu_1 (Activation)    (None, 512, 73, 45)  0           block_4a_bn_1[0][0]
__________________________________________________________________________________________________
block_4a_conv_2 (Conv2D)        (None, 512, 73, 45)  2359808     block_4a_relu_1[0][0]
__________________________________________________________________________________________________
block_4a_conv_shortcut (Conv2D) (None, 512, 73, 45)  131584      block_3b_relu[0][0]
__________________________________________________________________________________________________
block_4a_bn_2 (BatchNormalizati (None, 512, 73, 45)  2048        block_4a_conv_2[0][0]
__________________________________________________________________________________________________
block_4a_bn_shortcut (BatchNorm (None, 512, 73, 45)  2048        block_4a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_7 (Add)                     (None, 512, 73, 45)  0           block_4a_bn_2[0][0]
                                                                 block_4a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_4a_relu (Activation)      (None, 512, 73, 45)  0           add_7[0][0]
__________________________________________________________________________________________________
block_4b_conv_1 (Conv2D)        (None, 512, 73, 45)  2359808     block_4a_relu[0][0]
__________________________________________________________________________________________________
block_4b_bn_1 (BatchNormalizati (None, 512, 73, 45)  2048        block_4b_conv_1[0][0]
__________________________________________________________________________________________________
block_4b_relu_1 (Activation)    (None, 512, 73, 45)  0           block_4b_bn_1[0][0]
__________________________________________________________________________________________________
block_4b_conv_2 (Conv2D)        (None, 512, 73, 45)  2359808     block_4b_relu_1[0][0]
__________________________________________________________________________________________________
block_4b_bn_2 (BatchNormalizati (None, 512, 73, 45)  2048        block_4b_conv_2[0][0]
__________________________________________________________________________________________________
add_8 (Add)                     (None, 512, 73, 45)  0           block_4b_bn_2[0][0]
                                                                 block_4a_relu[0][0]
__________________________________________________________________________________________________
block_4b_relu (Activation)      (None, 512, 73, 45)  0           add_8[0][0]
__________________________________________________________________________________________________
output_bbox (Conv2D)            (None, 4, 73, 45)    2052        block_4b_relu[0][0]
__________________________________________________________________________________________________
output_cov (Conv2D)             (None, 1, 73, 45)    513         block_4b_relu[0][0]
==================================================================================================
Total params: 11,197,893
Trainable params: 11,188,165
Non-trainable params: 9,728

Did you still keep the training log? How about its “Trainable params” ?

What is training log? Is it as shown below?

Please check in the beginning about
Total params:
Trainable params:
Non-trainable params:

The “Trainable params” of the pruned model is the same as that of the unpruned model. Why?

Maybe there is something wrong during your setting. Can you check if you use the pruned model to retrain?

After I tested it many times, “Trainable params” of the pruned model is the same as that of the unpruned model. I also tried prune other models such as ‘resnet18_trafficcamnet.tlt’ and I got the same result. The instructions I use are as follows:

tao detectnet_v2 prune -m /workspace/openalpr/ccpd_unpruned_1.tlt -o /workspace/openalpr/ccpd_pruned_1.tlt -eq union -pth 4 -k nvidia_tlt

Any ideas?

It does not make sense. Did you change -pth ?

I have tried many ‘-pth’ parameters,their “Trainable params” are the same.

Usually for detectnet_v2 network, the pth is a small one, please set to a smaller one to check.
Below is a value mentioned in the jupyter notebook

-pth 0.0000052

More, please check your training spec file, did you set the pruned model as the pretrained model?

I just tried to set the ‘-pth’ to 0.0000052, but the “Trainable params” are the same. More I copied the training spec file and named it ‘SPECS_retrain.txt’, i set the pruned model as the pretrained model in it.