Tao yolov4 pruned model is stuck at 6.5 FPS

adithya.ajith · August 1, 2024, 11:27am

• Hardware : A5000
• Network Type: Yolo_v4
• TLT Version: 3.22.05
• Training spec file :
d26_yolov4_apm_apr1924_pruned_retrain_v5.txt (3.0 KB)

The model trained with the above spec file, after exporting and converting to an engine file with int8 precision and batch size 1 when tested using trtexec gives a maximum of 6.5 FPS. Log details already shared in a previous thread Low FPS for pruned tao toolkit models on deepstream - #30 by Fiona.Chen

Pruning ratio for the model is 0.57

How do I train a yolov4 model on tao toolkit which will give me 15 FPS on trtexec?

Morganh · August 2, 2024, 5:26am

From your log in comment Low FPS for pruned tao toolkit models on deepstream - #16 by adithya.ajith,

[08/01/2024-12:52:30] [I] GPU Compute Time: min = 4.38373 ms, max = 4.71448 ms, mean = 4.43574 ms, median = 4.43896 ms, percentile(90%) = 4.44519 ms, percentile(95%) = 4.46667 ms, percentile(99%) = 4.47385 ms

It is about 1000/4.43574 = 225fps.

May I know the log for 6.5 fps?

adithya.ajith · August 2, 2024, 6:42am

Look at this in the context of my above forum question, I am looking to achieve 15 FPS for 30 cameras. This would mean the sum of my compute time, d2h and h2d latencies should come down by more than half from my current value of roughly 5.45ms.

I am looking to bring down the inference time for the model.

Morganh · August 2, 2024, 7:11am

Suggest you to use YOLO_v4_tiny network.

adithya.ajith · August 2, 2024, 7:13am

I cannot change the network architecture and the input size because of accuracy constraints. Are there any other options ?

Morganh · August 2, 2024, 10:14am

Actually the YOLO_v4_tiny is just changing to another backbone comparing to YOLO_v4.
You can setup similar experiments to run training and check the mAP result.

adithya.ajith · August 5, 2024, 5:14am

Before moving on to a different model, can you tell me if there is anyway to extract more FPS out of a Yolov4 model, since it is a tried and tested model in terms of accuracy for my usecase.

I am specifically asking in terms of changes in the training config, for example can you recommend a different architecture which is lighter but is comparable to resnet18 in terms of feature extraction.
A different pruning to my current pruning (cmd used tao yolo_v4 prune -m <model-path> -o <output-path> -k <key> -e <path-to-training-config> -pth 0.5 -eq intersection).
Any changes recommended while exporting the model (cmd used tao yolo_v4 export -m <model-path> -o <path-to-.etlt-file> -k <key> --data_type int8 -e <path-to-training-config> --cal_cache_file <path-to-cal.bin-file>).
Any recommendations for engine file creation, currently the model’s engine file is created when the deepstream (ver6.3) pipeline starts.

Also about Yolov4_tiny are you saying that it has the same architecture as Yolov4 and the only difference is the difference in the backbones supported.

Morganh · August 5, 2024, 6:46am

One important thing is for the mAP. You mention you are using TAO3.22.05. Can you use a newer version of TAO docker to train? As mentioned in another topic, TAO5.0(or 4.0.0 or 4.0.1) version can improve the mAP by fixing issues in the yolov4 structure, the loss function, and etc.
For backbone, you can run experiments on mobilenet_v1 or mobilenet_v2.
For pruning, after pruning, it is needed to run training against the pruned model to retain similar mAP. You can prune a bit → retrain → prune a bit → retrain → etc.
For exporting and engine generation, suggest you to use TAO5.0 version. It will export to onnx file and then you can run trtexec to generate tensorrt engine.
For last question, yes, it is.

adithya.ajith · August 5, 2024, 7:41am

I am happy with the mAP of the model (trained on 3.22.05). I don’t want to migrate to 5.x for better accuracy but it would make sense if the overall upgrades like change in the yolov4 structure etc that you have mentioned also results in performance improvements for the model.

Also we experimented with TAO 5.x but it has an issue with the validation tfrecords that are generated, this in turn results in wrong mAP calculation, a team member of mine has discovered this, and he has already raised this issue in forums or is planning to.

Does using trtexec to generate tensorrt engine compared to generating it in the ds pipeline give a performance bump ?

As for the points regarding the backbones and pruning makes sense and is something that I can start experimenting.

Morganh · August 5, 2024, 7:50am

No, it does not mean improving mAP. Just to provide another way to generate engine instead of using deepstream. One more option is to decode the .etlt model to onnx file. Refer to tao_toolkit_recipes/tao_forum_faq/FAQ.md at main · NVIDIA-AI-IOT/tao_toolkit_recipes · GitHub . Then you can also use trtexec.

adithya.ajith · August 5, 2024, 8:34am

To clear the confusion I am talking about performance (FPS) bump not mAP.

What can you tell me about my first question regarding performance improvement in 5.x vs 3.x ?

Morganh · August 5, 2024, 8:45am

For the 5.x vs 3.x in YOLO_v4, it focuses on mAP only.

adithya.ajith · August 6, 2024, 4:55am

Given the large input size for my model (1888*1056) will Mobilenet_v2 which a smaller backbone compared to resnets be able to extract features properly at all 3 scales, detection of small and medium sized objects are very important for the usecase where the model will be deployed. If you think my above point is valid, does it make sense to start my training experiments with the yolov4_tiny model.

adithya.ajith · August 6, 2024, 6:24am

For yolo_v4_tiny I cannot find the backbone cspdarknet_tiny_3l in ngc . Is there any other source for this model ?

Morganh · August 6, 2024, 7:09am

There is one pretrained model “cspdarknet_tiny.hdf5” in ngc. For backbone cspdarknet_tiny_3l, you can use the cspdarknet_tiny.hdf5 as pretrained model.

adithya.ajith · August 7, 2024, 6:24am

May I know your thoughts on my question about Mobilenet_v2.

Morganh · August 7, 2024, 6:50am

For mobilenet_v2, you can run the training to see if it can get competitive mAP as expected.

yingliu · September 3, 2024, 2:01am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

system · September 17, 2024, 2:02am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Engine convtered using TAO 4.0.0 has lower performance than 3.21.08 TAO Toolkit tensorrt , tao	9	548	June 20, 2023
Lower FPS compared to the unpruned model for the pruned MaskRCNN model TAO Toolkit	46	91	November 14, 2024
Convert TAO Yolov4 model to DLA engine fails TAO Toolkit	22	1662	March 1, 2022
Extremely slow train and evaluation of yolo_v4_tiny TAO Toolkit yolo , tao	12	1225	April 12, 2023
Probleme with training/pruning tlt TAO Toolkit yolo	10	944	October 12, 2021
YOLOv4 accuracy difference between TAO and Darknet TAO Toolkit	5	1476	October 12, 2021
Performance of TAO 3.22.05 and TAO 4.0.1 is lower than TAO 3.21.08 TAO Toolkit	9	479	June 15, 2023
run yolov3-tiny with tensorRT model Jetson Nano	7	3375	January 4, 2020
Low FPS for Frcnn model DeepStream SDK	12	561	January 25, 2022
Reproducing YoloV4 COCO mAP TAO Toolkit	17	1458	March 2, 2022

Tao yolov4 pruned model is stuck at 6.5 FPS

Related topics