Hello, I created a system for detecting and monitoring iguanas in real time and I would like to share with you how I develop this project.
Green iguanas cause damage to residential and commercial landscape vegetation and are often considered a nuisance by property owners.
Using the power of Edge-computing, we developed a open-project that people can download and use for tracking and monitoring of iguanas
- Data colletion
- 1.1 Scraping Images Using Selenium
- 1.2 Labeling
- Training & Optimization
- 2.1 Nvidia TLT (Transfer Learning Toolkit)
- 2.2 Download CV Sample Workflows from ngc
- 2.3 Explores different backbone networks of YOLO V4
- 2.4 Optimization
- 2.5 Retrain pruned models and Export
- 3.1 Generate optimized runtime engines on Jetson Nano
- 3.2 Real-time inference and data-streaming using NVIDIA DeepStream
- Plotly Dash
Selenium is a Python library and tool used for automating web browsers to do a number of tasks. One of such is web-scraping to extract useful data and information that may be otherwise unavailable. Scraping iguana images using Selenium, we collected over 4,000 images.
We created a labeled dataset in yolo-format in makesense.ai. Makesense.ai is a free-to-use online tool for labeling photos, we found makesense.ai very convenient to use because no advanced installation required.
The whole labeled dataset is open, you can download it here
There are many DL frameworks out there, but NVIDIA TLT was the best for my purpose since it abstracted away the AI/deep learning framework complexity, let us fine-tune on high-quality pre-trained AI models.
The most important is, NVIDIA TLT was capable of optimizing models for deploying deep-learning inference networks across various NVIDIA platforms, which made my works a lot easlier.
Plus: TLT is called Tao now and with more advanced features.
I downloaded the TLT’s sample notebooks from ngc which covered all the process I need for model training, optimization, export. I didn’t write a single line of code in steps 2.3-2.5.
I explored some backbone networks of YOLO V4, the training times are as below:
|architecture||backbone||images||training time / epoch||GPU|
|YOLO V4||ResNet 18||4700||05:30||RTX 3090*1|
|YOLO V4||ResNet 34||4700||08:17||RTX 3090*1|
|YOLO V4||CSPDarkNet 19||4700||07:46||RTX 3090*1|
|YOLO V4||CSPDarkNet 53||4700||12:03||RTX 3090*1|
|YOLO V4||MobileNet V2||4700||03:21||RTX 3090*1|
The NVIDIA TLT provided a key feature known as model pruning which remove unnecessary connections in a neural network so that the corresponding computation does not need execute, freeing up memory, time, energy.
My model pruning records are as below:
|YOLO V4||CSPDarkNet 53||0.1||28,000,000||14,000,000|
|YOLO V4||MobileNet V2||0.1||3,400,000||23,800|
Model needs to be re-trained to bring back accuracy after pruning. After retrained , the next step is to export the model. After the model is exported, it can be used on an edge device for deployment. The model can be exported using FP32, FP16, or INT8 precision. The default is FP16, which is used in this project.
I used a Jetson Nano Developer Kit for inferencing, because I wanted to see if its computing power can be used for real-time IVA applications.
The computer vision models trained by TLT can be consumed by TensorRT, via the
tlt-converter tool. The TLT Converter parses the exported
.etlt model file, and generates an optimized TensorRT engine. These engines can be generated to support inference at low precision, such as
The TLT Converter is distributed as a separate binary for x86 and Jetson platforms. You can find the converters in this page.
I downloaded the converter, copyied it into my Jetson Nano, and converted the pre-trained YOLO V4 model. It took a while to complete the conversion as below:
|YOLO V4||CSPDarkNet 53||Jetson||4.5||63 minutes|
|YOLO V4||MobileNet V2||Jetson||4.5||28 minutes|
The last step in the deployment process is to configure and run the DeepStream app. The main steps include installing the DeepStream SDK, building a bounding box parser for YOLO V4, building a DeepStream app, and finally running the app.
The Deepstream app received live images, predict, streaming data,
I choose MobileNet V2 as the backbone of YOLO V4. By tuning parameters, the app was able to achieve nearly real-time inference speed.
|architecture||backbone||infer-dims||platform||skip frame||inference speed||RTSP streaming||MQTT streaming|
|YOLO V4||CSPDarkNet 53||33841248||Jetson Nano||0||0.2 fps||No||No|
|YOLO V4||MobileNet V2||33841248||Jetson Nano||0||16 fps||Yes||No|
|YOLO V4||MobileNet V2||33841248||Jetson Nano||4||27 fps||Yes||No|
|YOLO V4||MobileNet V2||33841248||Jetson Nano||0||10 fps||Yes||Yes|
|YOLO V4||MobileNet V2||33841248||Jetson Nano||8||18 fps||Yes||Yes|
YOLO V4 CspDarknet53 on Jetson Nano
YOLO V4 MobileNet V2 on Jetson Nano
The monitoring dashboard is built by Plotly Dash and Python, Plotly Dash is designed for data visualization that makes it easier to build consistently styled apps with complex, responsive layouts.
Due to some reasons, I only had a small amount of time to complete this project, so this system still has a lot of room for improvement, for example:
- TAO toolkit now includes YOLO V4 tiny which the inference speed would be faster than normal YOLO V4 .
- Using Kafka as the backend and data streaming server is apparently a better option, because it can handle huge volumes of data and remains responsive. Also, DeepStream has a dedicated Kafka Plug-in, the data streaming speed will be far greater than the current way.
- ELK stack for data streaming, data storage, visualization.
I plan to update this project gradually, any new progress in the future will be shared with you in this post, thank you for taking the time to read my post.