The model trained with the above spec file, after exporting and converting to an engine file with int8 precision and batch size 1 when tested using trtexec gives a maximum of 6.5 FPS. Log details already shared in a previous thread Low FPS for pruned tao toolkit models on deepstream - #30 by Fiona.Chen
Pruning ratio for the model is 0.57
How do I train a yolov4 model on tao toolkit which will give me 15 FPS on trtexec?
Look at this in the context of my above forum question, I am looking to achieve 15 FPS for 30 cameras. This would mean the sum of my compute time, d2h and h2d latencies should come down by more than half from my current value of roughly 5.45ms.
I am looking to bring down the inference time for the model.
Actually the YOLO_v4_tiny is just changing to another backbone comparing to YOLO_v4.
You can setup similar experiments to run training and check the mAP result.
Before moving on to a different model, can you tell me if there is anyway to extract more FPS out of a Yolov4 model, since it is a tried and tested model in terms of accuracy for my usecase.
I am specifically asking in terms of changes in the training config, for example can you recommend a different architecture which is lighter but is comparable to resnet18 in terms of feature extraction.
A different pruning to my current pruning (cmd used tao yolo_v4 prune -m <model-path> -o <output-path> -k <key> -e <path-to-training-config> -pth 0.5 -eq intersection).
Any changes recommended while exporting the model (cmd used tao yolo_v4 export -m <model-path> -o <path-to-.etlt-file> -k <key> --data_type int8 -e <path-to-training-config> --cal_cache_file <path-to-cal.bin-file>).
Any recommendations for engine file creation, currently the model’s engine file is created when the deepstream (ver6.3) pipeline starts.
Also about Yolov4_tiny are you saying that it has the same architecture as Yolov4 and the only difference is the difference in the backbones supported.
One important thing is for the mAP. You mention you are using TAO3.22.05. Can you use a newer version of TAO docker to train? As mentioned in another topic, TAO5.0(or 4.0.0 or 4.0.1) version can improve the mAP by fixing issues in the yolov4 structure, the loss function, and etc.
For backbone, you can run experiments on mobilenet_v1 or mobilenet_v2.
For pruning, after pruning, it is needed to run training against the pruned model to retain similar mAP. You can prune a bit → retrain → prune a bit → retrain → etc.
For exporting and engine generation, suggest you to use TAO5.0 version. It will export to onnx file and then you can run trtexec to generate tensorrt engine.
For last question, yes, it is.
I am happy with the mAP of the model (trained on 3.22.05). I don’t want to migrate to 5.x for better accuracy but it would make sense if the overall upgrades like change in the yolov4 structure etc that you have mentioned also results in performance improvements for the model.
Also we experimented with TAO 5.x but it has an issue with the validation tfrecords that are generated, this in turn results in wrong mAP calculation, a team member of mine has discovered this, and he has already raised this issue in forums or is planning to.
Does using trtexec to generate tensorrt engine compared to generating it in the ds pipeline give a performance bump ?
As for the points regarding the backbones and pruning makes sense and is something that I can start experimenting.
Given the large input size for my model (1888*1056) will Mobilenet_v2 which a smaller backbone compared to resnets be able to extract features properly at all 3 scales, detection of small and medium sized objects are very important for the usecase where the model will be deployed. If you think my above point is valid, does it make sense to start my training experiments with the yolov4_tiny model.
There is one pretrained model “cspdarknet_tiny.hdf5” in ngc. For backbone cspdarknet_tiny_3l, you can use the cspdarknet_tiny.hdf5 as pretrained model.
There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks