[TLT / Jetson Nano] Iguana detection on NVIDIA Jetson Nano for monitoring

Hello, I created a system for detecting and monitoring iguanas in real time and I would like to share with you how I develop this project.
github repo.

Motivation

Green iguanas cause damage to residential and commercial landscape vegetation and are often considered a nuisance by property owners.

Using the power of Edge-computing, we developed a open-project that people can download and use for tracking and monitoring of iguanas

Iguana detection on Nvidia Jetson Nano for monitoring

Software architecture

Please click the image down below to see the software architecture animation.
IMAGE_ALT

Steps

    1. Data colletion
    • 1.1 Scraping Images Using Selenium
    • 1.2 Labeling
    1. Training & Optimization
    • 2.1 Nvidia TLT (Transfer Learning Toolkit)
    • 2.2 Download CV Sample Workflows from ngc
    • 2.3 Explores different backbone networks of YOLO V4
    • 2.4 Optimization
    • 2.5 Retrain pruned models and Export
    1. Deployment
    • 3.1 Generate optimized runtime engines on Jetson Nano
    • 3.2 Real-time inference and data-streaming using NVIDIA DeepStream
    1. Monitoring
    • Plotly Dash

1. Scraping Images Using Selenium

1.1 Scraping Images Using Selenium

Selenium is a Python library and tool used for automating web browsers to do a number of tasks. One of such is web-scraping to extract useful data and information that may be otherwise unavailable. Scraping iguana images using Selenium, we collected over 4,000 images.

1.2 Labeling

We created a labeled dataset in yolo-format in makesense.ai. Makesense.ai is a free-to-use online tool for labeling photos, we found makesense.ai very convenient to use because no advanced installation required.

The whole labeled dataset is open, you can download it here

2. Training & Optimization

2.1 Nvidia TLT (Transfer Learning Toolkit)

There are many DL frameworks out there, but NVIDIA TLT was the best for my purpose since it abstracted away the AI/deep learning framework complexity, let us fine-tune on high-quality pre-trained AI models.

The most important is, NVIDIA TLT was capable of optimizing models for deploying deep-learning inference networks across various NVIDIA platforms, which made my works a lot easlier.

Plus: TLT is called Tao now and with more advanced features.

2.2 Download CV Sample Workflows from ngc

I downloaded the TLT’s sample notebooks from ngc which covered all the process I need for model training, optimization, export. I didn’t write a single line of code in steps 2.3-2.5.

2.3 Explores different backbone networks of YOLO V4

I explored some backbone networks of YOLO V4, the training times are as below:

architecture backbone images training time / epoch GPU
YOLO V4 ResNet 18 4700 05:30 RTX 3090*1
YOLO V4 ResNet 34 4700 08:17 RTX 3090*1
YOLO V4 CSPDarkNet 19 4700 07:46 RTX 3090*1
YOLO V4 CSPDarkNet 53 4700 12:03 RTX 3090*1
YOLO V4 MobileNet V2 4700 03:21 RTX 3090*1

2.4 Optimization

image

The NVIDIA TLT provided a key feature known as model pruning which remove unnecessary connections in a neural network so that the corresponding computation does not need execute, freeing up memory, time, energy.

My model pruning records are as below:

architecture backbone threshold Parameters Parameters pruned
YOLO V4 CSPDarkNet 53 0.1 28,000,000 14,000,000
YOLO V4 MobileNet V2 0.1 3,400,000 23,800

2.5 Retrain pruned models and export

Model needs to be re-trained to bring back accuracy after pruning. After retrained , the next step is to export the model. After the model is exported, it can be used on an edge device for deployment. The model can be exported using FP32, FP16, or INT8 precision. The default is FP16, which is used in this project.

3. Deployment


I used a Jetson Nano Developer Kit for inferencing, because I wanted to see if its computing power can be used for real-time IVA applications.

3.1 Generate optimized runtime engines on Jetson Nano

The computer vision models trained by TLT can be consumed by TensorRT, via the tlt-converter tool. The TLT Converter parses the exported .etlt model file, and generates an optimized TensorRT engine. These engines can be generated to support inference at low precision, such as FP16 or INT8 .

The TLT Converter is distributed as a separate binary for x86 and Jetson platforms. You can find the converters in this page.

image

I downloaded the converter, copyied it into my Jetson Nano, and converted the pre-trained YOLO V4 model. It took a while to complete the conversion as below:

architecture backbone platform Ver conversion time
YOLO V4 CSPDarkNet 53 Jetson 4.5 63 minutes
YOLO V4 MobileNet V2 Jetson 4.5 28 minutes

3.2 Real-time inference and data-streaming using NVIDIA DeepStream

The last step in the deployment process is to configure and run the DeepStream app. The main steps include installing the DeepStream SDK, building a bounding box parser for YOLO V4, building a DeepStream app, and finally running the app.

image

The Deepstream app received live images, predict, streaming data,
I choose MobileNet V2 as the backbone of YOLO V4. By tuning parameters, the app was able to achieve nearly real-time inference speed.

architecture backbone infer-dims platform skip frame inference speed RTSP streaming MQTT streaming
YOLO V4 CSPDarkNet 53 33841248 Jetson Nano 0 0.2 fps No No
YOLO V4 MobileNet V2 33841248 Jetson Nano 0 16 fps Yes No
YOLO V4 MobileNet V2 33841248 Jetson Nano 4 27 fps Yes No
YOLO V4 MobileNet V2 33841248 Jetson Nano 0 10 fps Yes Yes
YOLO V4 MobileNet V2 33841248 Jetson Nano 8 18 fps Yes Yes

YOLO V4 CspDarknet53 on Jetson Nano

YOLO V4 MobileNet V2 on Jetson Nano

4. Monitoring

The monitoring dashboard is built by Plotly Dash and Python, Plotly Dash is designed for data visualization that makes it easier to build consistently styled apps with complex, responsive layouts.

5. Improvement

Due to some reasons, I only had a small amount of time to complete this project, so this system still has a lot of room for improvement, for example:

  • TAO toolkit now includes YOLO V4 tiny which the inference speed would be faster than normal YOLO V4 .
  • Using Kafka as the backend and data streaming server is apparently a better option, because it can handle huge volumes of data and remains responsive. Also, DeepStream has a dedicated Kafka Plug-in, the data streaming speed will be far greater than the current way.
  • ELK stack for data streaming, data storage, visualization.

I plan to update this project gradually, any new progress in the future will be shared with you in this post, thank you for taking the time to read my post.

4 Likes