Deepstream_lpr_app runs slowly

Diluk · November 9, 2021, 2:58am

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
Xavier
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
LPDnet,LPRnet
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
dockers:
nvidia/tao/tao-toolkit-tf:
docker_registry: nvcr.io
docker_tag: v3.21.08-py3
tasks:
1. augment
2. bpnet
3. classification
4. detectnet_v2
5. dssd
6. emotionnet
7. faster_rcnn
8. fpenet
9. gazenet
10. gesturenet
11. heartratenet
12. lprnet
13. mask_rcnn
14. multitask_classification
15. retinanet
16. ssd
17. unet
18. yolo_v3
19. yolo_v4
20. converter
nvidia/tao/tao-toolkit-pyt:
docker_registry: nvcr.io
docker_tag: v3.21.08-py3
tasks:
1. speech_to_text
2. speech_to_text_citrinet
3. text_classification
4. question_answering
5. token_classification
6. intent_slot_classification
7. punctuation_and_capitalization
nvidia/tao/tao-toolkit-lm:
docker_registry: nvcr.io
docker_tag: v3.21.08-py3
tasks:
1. n_gram
format_version: 1.0
toolkit_version: 3.21.08
published_date: 08/17/2021

• Training spec file(If have, please share here)
#LPDnet spec file
random_seed: 42
dataset_config {
data_sources {
tfrecords_path: “/workspace/openalpr/lpd_tfrecord/*”
image_directory_path: “/workspace/openalpr/data/”
}
image_extension: “jpg”
target_class_mapping {
key: “lpd”
value: “lpd”
}
validation_fold: 0
}
augmentation_config {
preprocessing {
output_image_width: 720
output_image_height: 1168
min_bbox_width: 1.0
min_bbox_height: 1.0
output_image_channel: 3
}
spatial_augmentation {
hflip_probability: 0.5
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}
postprocessing_config {
target_class_config {
key: “lpd”
value {
clustering_config {
coverage_threshold: 0.00499999988824
dbscan_eps: 0.20000000298
dbscan_min_samples: 0.0500000007451
minimum_bounding_box_height: 4
}
}
}
}
model_config {
pretrained_model_file: “/workspace/openalpr/ccpd_unpruned_1.tlt”
num_layers: 18
use_batch_norm: true
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
training_precision {
backend_floatx: FLOAT32
}
arch: “resnet”
}
evaluation_config {
validation_period_during_training: 20
first_validation_epoch: 1
minimum_detection_ground_truth_overlap {
key: “lpd”
value: 0.699999988079
}
evaluation_box_config {
key: “lpd”
value {
minimum_height: 10
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
average_precision_mode: INTEGRATE
}
cost_function_config {
target_classes {
name: “lpd”
class_weight: 1.0
coverage_foreground_weight: 0.0500000007451
objectives {
name: “cov”
initial_weight: 1.0
weight_target: 1.0
}
objectives {
name: “bbox”
initial_weight: 10.0
weight_target: 10.0
}
}
enable_autoweighting: true
max_objective_weight: 0.999899983406
min_objective_weight: 9.99999974738e-05
}
training_config {
batch_size_per_gpu: 16
num_epochs: 600
enable_qat: False
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-06
max_learning_rate: 5e-04
soft_start: 0.10000000149
annealing: 0.699999988079
}
}
regularizer {
type: L1
weight: 3.00000002618e-09
}
optimizer {
adam {
epsilon: 9.99999993923e-09
beta1: 0.899999976158
beta2: 0.999000012875
}
}
cost_scaling {
initial_exponent: 20.0
increment: 0.005
decrement: 1.0
}
checkpoint_interval: 10
}
bbox_rasterizer_config {
target_class_config {
key: “lpd”
value {
cov_center_x: 0.5
cov_center_y: 0.5
cov_radius_x: 0.40000000596
cov_radius_y: 0.40000000596
bbox_min_radius: 1.0
}
}
deadzone_radius: 0.400000154972
}

I trained LPDnet and LPRnet using a custom data set, and successfully ran deepstream_lpr_app on Xavier NX, but his speed was very slow. Its average fps is only 7, referring to deepstream_lpr_app, the theoretical speed can reach 3 streams with a total fps of 80.31. Any ideas? thanks.

Morganh · November 9, 2021, 3:28am

The fps result mentioned in deepstream_lpr_app is tested with usa_pruned lpd model in the pipeline . Its resolution is 640x480. And it is a pruned model which has smaller size.
Actually in the pipeline for you to detect Chinese license plate, you can also just replace with the lpr model. Keep tracficcament model and usa_pruned lpd model as is.

Diluk · November 10, 2021, 6:30am

Hi,Morganh:
In my test, the accuracy of the LPDnet model is not good, so I retrained unpruned.tlt with a custom data set, and the model obtained is very large, nearly 10 times that of pruned.tlt. Refer to lpr-app,It does not introduce the method of prune model. I tried to use tao detectnet_v2 prune, but an error occurred. Any ideas?

Morganh · November 10, 2021, 6:43am

LPDnet model is actually based on detectnet_v2 network.
So, please use it to prune and run retraining.

Diluk · November 10, 2021, 6:58am

The following error occurred:

root@IDC_GPU_Server-1:/home/sutpc/xiukd/zjq/download/tlt-experiments/LPDR/lpd# tao detectnet_v2 prune -m ./ccpd_unpruned_1.tlt -o ./ccpd_unpruned_1_prune.tlt -pth 0.3 -k nvidia_tlt
2021-11-10 14:55:26,595 [INFO] root: Registry: ['nvcr.io']
2021-11-10 14:55:27,403 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/root/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/prune.py", line 16, in <module>
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_prune.py", line 185, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_prune.py", line 120, in run_pruning
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py", line 282, in decode_to_keras
ValueError: Cannot find input file name.
2021-11-10 14:55:38,354 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Morganh · November 10, 2021, 7:01am

All the file path should be the path inside the docker.
Please check your ~/.tao_mounts.json file how did you map your local directories to the docker.

Diluk · November 11, 2021, 7:28am

I used tao detectnet_v2 prune to compress LPDnet, but when I used tao detectnet_v2 export to convert tlt to etlt, the file size returned to its original size, any ideas?

Morganh · November 11, 2021, 7:33am

When you run pruning, can you check in the log about the pruning ratio?
More, after pruning, please run retraining. And you can also check the trainable parameters in the retraining log.

Diluk · November 11, 2021, 8:51am

The pruning ratio is very low, as shown below:

When I retrain the pruned model, its file size is the same as before pruned,the trainable parameters are as follows:

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_1 (InputLayer)            (None, 3, 1168, 720) 0
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 64, 584, 360) 9472        input_1[0][0]
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization)   (None, 64, 584, 360) 256         conv1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 64, 584, 360) 0           bn_conv1[0][0]
__________________________________________________________________________________________________
block_1a_conv_1 (Conv2D)        (None, 64, 292, 180) 36928       activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_1 (BatchNormalizati (None, 64, 292, 180) 256         block_1a_conv_1[0][0]
__________________________________________________________________________________________________
block_1a_relu_1 (Activation)    (None, 64, 292, 180) 0           block_1a_bn_1[0][0]
__________________________________________________________________________________________________
block_1a_conv_2 (Conv2D)        (None, 64, 292, 180) 36928       block_1a_relu_1[0][0]
__________________________________________________________________________________________________
block_1a_conv_shortcut (Conv2D) (None, 64, 292, 180) 4160        activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_2 (BatchNormalizati (None, 64, 292, 180) 256         block_1a_conv_2[0][0]
__________________________________________________________________________________________________
block_1a_bn_shortcut (BatchNorm (None, 64, 292, 180) 256         block_1a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_1 (Add)                     (None, 64, 292, 180) 0           block_1a_bn_2[0][0]
                                                                 block_1a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_1a_relu (Activation)      (None, 64, 292, 180) 0           add_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_1 (Conv2D)        (None, 64, 292, 180) 36928       block_1a_relu[0][0]
__________________________________________________________________________________________________
block_1b_bn_1 (BatchNormalizati (None, 64, 292, 180) 256         block_1b_conv_1[0][0]
__________________________________________________________________________________________________
block_1b_relu_1 (Activation)    (None, 64, 292, 180) 0           block_1b_bn_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_2 (Conv2D)        (None, 64, 292, 180) 36928       block_1b_relu_1[0][0]
__________________________________________________________________________________________________
block_1b_bn_2 (BatchNormalizati (None, 64, 292, 180) 256         block_1b_conv_2[0][0]
__________________________________________________________________________________________________
add_2 (Add)                     (None, 64, 292, 180) 0           block_1b_bn_2[0][0]
                                                                 block_1a_relu[0][0]
__________________________________________________________________________________________________
block_1b_relu (Activation)      (None, 64, 292, 180) 0           add_2[0][0]
__________________________________________________________________________________________________
block_2a_conv_1 (Conv2D)        (None, 128, 146, 90) 73856       block_1b_relu[0][0]
__________________________________________________________________________________________________
block_2a_bn_1 (BatchNormalizati (None, 128, 146, 90) 512         block_2a_conv_1[0][0]
__________________________________________________________________________________________________
block_2a_relu_1 (Activation)    (None, 128, 146, 90) 0           block_2a_bn_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_2 (Conv2D)        (None, 128, 146, 90) 147584      block_2a_relu_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_shortcut (Conv2D) (None, 128, 146, 90) 8320        block_1b_relu[0][0]
__________________________________________________________________________________________________
block_2a_bn_2 (BatchNormalizati (None, 128, 146, 90) 512         block_2a_conv_2[0][0]
__________________________________________________________________________________________________
block_2a_bn_shortcut (BatchNorm (None, 128, 146, 90) 512         block_2a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_3 (Add)                     (None, 128, 146, 90) 0           block_2a_bn_2[0][0]
                                                                 block_2a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_2a_relu (Activation)      (None, 128, 146, 90) 0           add_3[0][0]
__________________________________________________________________________________________________
block_2b_conv_1 (Conv2D)        (None, 128, 146, 90) 147584      block_2a_relu[0][0]
__________________________________________________________________________________________________
block_2b_bn_1 (BatchNormalizati (None, 128, 146, 90) 512         block_2b_conv_1[0][0]
__________________________________________________________________________________________________
block_2b_relu_1 (Activation)    (None, 128, 146, 90) 0           block_2b_bn_1[0][0]
__________________________________________________________________________________________________
block_2b_conv_2 (Conv2D)        (None, 128, 146, 90) 147584      block_2b_relu_1[0][0]
__________________________________________________________________________________________________
block_2b_bn_2 (BatchNormalizati (None, 128, 146, 90) 512         block_2b_conv_2[0][0]
__________________________________________________________________________________________________
add_4 (Add)                     (None, 128, 146, 90) 0           block_2b_bn_2[0][0]
                                                                 block_2a_relu[0][0]
__________________________________________________________________________________________________
block_2b_relu (Activation)      (None, 128, 146, 90) 0           add_4[0][0]
__________________________________________________________________________________________________
block_3a_conv_1 (Conv2D)        (None, 256, 73, 45)  295168      block_2b_relu[0][0]
__________________________________________________________________________________________________
block_3a_bn_1 (BatchNormalizati (None, 256, 73, 45)  1024        block_3a_conv_1[0][0]
__________________________________________________________________________________________________
block_3a_relu_1 (Activation)    (None, 256, 73, 45)  0           block_3a_bn_1[0][0]
__________________________________________________________________________________________________
block_3a_conv_2 (Conv2D)        (None, 256, 73, 45)  590080      block_3a_relu_1[0][0]
__________________________________________________________________________________________________
block_3a_conv_shortcut (Conv2D) (None, 256, 73, 45)  33024       block_2b_relu[0][0]
__________________________________________________________________________________________________
block_3a_bn_2 (BatchNormalizati (None, 256, 73, 45)  1024        block_3a_conv_2[0][0]
__________________________________________________________________________________________________
block_3a_bn_shortcut (BatchNorm (None, 256, 73, 45)  1024        block_3a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_5 (Add)                     (None, 256, 73, 45)  0           block_3a_bn_2[0][0]
                                                                 block_3a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_3a_relu (Activation)      (None, 256, 73, 45)  0           add_5[0][0]
__________________________________________________________________________________________________
block_3b_conv_1 (Conv2D)        (None, 256, 73, 45)  590080      block_3a_relu[0][0]
__________________________________________________________________________________________________
block_3b_bn_1 (BatchNormalizati (None, 256, 73, 45)  1024        block_3b_conv_1[0][0]
__________________________________________________________________________________________________
block_3b_relu_1 (Activation)    (None, 256, 73, 45)  0           block_3b_bn_1[0][0]
__________________________________________________________________________________________________
block_3b_conv_2 (Conv2D)        (None, 256, 73, 45)  590080      block_3b_relu_1[0][0]
__________________________________________________________________________________________________
block_3b_bn_2 (BatchNormalizati (None, 256, 73, 45)  1024        block_3b_conv_2[0][0]
__________________________________________________________________________________________________
add_6 (Add)                     (None, 256, 73, 45)  0           block_3b_bn_2[0][0]
                                                                 block_3a_relu[0][0]
__________________________________________________________________________________________________
block_3b_relu (Activation)      (None, 256, 73, 45)  0           add_6[0][0]
__________________________________________________________________________________________________
block_4a_conv_1 (Conv2D)        (None, 512, 73, 45)  1180160     block_3b_relu[0][0]
__________________________________________________________________________________________________
block_4a_bn_1 (BatchNormalizati (None, 512, 73, 45)  2048        block_4a_conv_1[0][0]
__________________________________________________________________________________________________
block_4a_relu_1 (Activation)    (None, 512, 73, 45)  0           block_4a_bn_1[0][0]
__________________________________________________________________________________________________
block_4a_conv_2 (Conv2D)        (None, 512, 73, 45)  2359808     block_4a_relu_1[0][0]
__________________________________________________________________________________________________
block_4a_conv_shortcut (Conv2D) (None, 512, 73, 45)  131584      block_3b_relu[0][0]
__________________________________________________________________________________________________
block_4a_bn_2 (BatchNormalizati (None, 512, 73, 45)  2048        block_4a_conv_2[0][0]
__________________________________________________________________________________________________
block_4a_bn_shortcut (BatchNorm (None, 512, 73, 45)  2048        block_4a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_7 (Add)                     (None, 512, 73, 45)  0           block_4a_bn_2[0][0]
                                                                 block_4a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_4a_relu (Activation)      (None, 512, 73, 45)  0           add_7[0][0]
__________________________________________________________________________________________________
block_4b_conv_1 (Conv2D)        (None, 512, 73, 45)  2359808     block_4a_relu[0][0]
__________________________________________________________________________________________________
block_4b_bn_1 (BatchNormalizati (None, 512, 73, 45)  2048        block_4b_conv_1[0][0]
__________________________________________________________________________________________________
block_4b_relu_1 (Activation)    (None, 512, 73, 45)  0           block_4b_bn_1[0][0]
__________________________________________________________________________________________________
block_4b_conv_2 (Conv2D)        (None, 512, 73, 45)  2359808     block_4b_relu_1[0][0]
__________________________________________________________________________________________________
block_4b_bn_2 (BatchNormalizati (None, 512, 73, 45)  2048        block_4b_conv_2[0][0]
__________________________________________________________________________________________________
add_8 (Add)                     (None, 512, 73, 45)  0           block_4b_bn_2[0][0]
                                                                 block_4a_relu[0][0]
__________________________________________________________________________________________________
block_4b_relu (Activation)      (None, 512, 73, 45)  0           add_8[0][0]
__________________________________________________________________________________________________
output_bbox (Conv2D)            (None, 4, 73, 45)    2052        block_4b_relu[0][0]
__________________________________________________________________________________________________
output_cov (Conv2D)             (None, 1, 73, 45)    513         block_4b_relu[0][0]
==================================================================================================
Total params: 11,197,893
Trainable params: 11,188,165
Non-trainable params: 9,728

Morganh · November 11, 2021, 9:15am

Did you still keep the training log? How about its “Trainable params” ?

Diluk · November 11, 2021, 9:27am

What is training log? Is it as shown below?

Morganh · November 11, 2021, 9:41am

Please check in the beginning about
Total params:
Trainable params:
Non-trainable params:

Diluk · November 11, 2021, 10:18am

The “Trainable params” of the pruned model is the same as that of the unpruned model. Why?

Morganh · November 11, 2021, 10:33am

Maybe there is something wrong during your setting. Can you check if you use the pruned model to retrain?

Diluk · November 12, 2021, 7:15am

After I tested it many times, “Trainable params” of the pruned model is the same as that of the unpruned model. I also tried prune other models such as ‘resnet18_trafficcamnet.tlt’ and I got the same result. The instructions I use are as follows:

tao detectnet_v2 prune -m /workspace/openalpr/ccpd_unpruned_1.tlt -o /workspace/openalpr/ccpd_pruned_1.tlt -eq union -pth 4 -k nvidia_tlt

Any ideas?

Morganh · November 12, 2021, 7:17am

It does not make sense. Did you change -pth ?

Diluk · November 12, 2021, 7:24am

I have tried many ‘-pth’ parameters,their “Trainable params” are the same.

Morganh · November 12, 2021, 8:05am

Usually for detectnet_v2 network, the pth is a small one, please set to a smaller one to check.
Below is a value mentioned in the jupyter notebook

-pth 0.0000052

Morganh · November 12, 2021, 8:06am

More, please check your training spec file, did you set the pruned model as the pretrained model?

Diluk · November 12, 2021, 8:22am

I just tried to set the ‘-pth’ to 0.0000052, but the “Trainable params” are the same. More I copied the training spec file and named it ‘SPECS_retrain.txt’, i set the pruned model as the pretrained model in it.

Topic		Replies	Views
CostFunctionConfig should have at least one class TAO Toolkit	8	841	October 12, 2021
Tlt-train loss is minimal but performances are bad TAO Toolkit	11	518	October 12, 2021
TLT training error : Key cost_sums/cyclist-bbox not found in checkpoint TAO Toolkit	6	1195	October 12, 2021
Retraining with pretrained tlt models TAO Toolkit	33	2717	October 12, 2021
Error training Faster RCNN model TAO Toolkit	17	1554	October 12, 2021
For same frame I get different output using .tlt and .engine TAO Toolkit	24	1628	October 12, 2021
Retinanet box out of bounds TAO Toolkit	4	445	June 29, 2022
DataLoader excepts at least on element in data_sources TAO Toolkit	10	480	October 28, 2022
DetectNet v2 training error - "ValueError: The zipfile extracted was corrupt. Please check your key " TAO Toolkit	2	1000	October 12, 2021
Errors in Training, 0 or Nan mAP, Low Loss, Tutorial Config TAO Toolkit	35	1795	October 12, 2021

Deepstream_lpr_app runs slowly

Related topics