Good morning all,
I have a need to track about 20 objects in real time (30 frames per second) from a video stream. The video resolution needs to be at
minimum, 1920 x 1080. Each object that will be tracked with have a unique Black letter/number on a white background. The size of that letter/number in pixels will be 45 pixels X 45 pixels which is very readable by the human eye.
I would like to stick with Python if at all possible.
I suspect this is outside the world of Raspberry Pi, or even a mini-PC.
Perhaps the Nvidia Orin Nano can do this but not sure?
Being new to this outside of basic object detection with OpenCV, I am not sure that the best process is.
Thoughts, opinions, suggestions, anything to help would be greatly appreciated.
Are the 20 objects contained in the same 1920x1080 frame?
If yes, this should be a detection problem with 20 classes.
You can find some examples below:
This file has been truncated.
<img src="https://github.com/dusty-nv/jetson-inference/raw/master/docs/images/deep-vision-header.jpg" width="100%">
<p align="right"><sup><a href="imagenet-tagging.md">Back</a> | <a href="detectnet-camera-2.md">Next</a> | </sup><a href="../README.md#hello-ai-world"><sup>Contents</sup></a>
# Locating Objects with DetectNet
The previous recognition examples output class probabilities representing the entire input image. Next we're going to focus on **object detection**, and finding where in the frame various objects are located by extracting their bounding boxes. Unlike image classification, object detection networks are capable of detecting many different objects per frame.
<img src="https://github.com/dusty-nv/jetson-inference/raw/master/docs/images/detectnet.jpg" >
The [`detectNet`](../c/detectNet.h) object accepts an image as input, and outputs a list of coordinates of the detected bounding boxes along with their classes and confidence values. [`detectNet`](../c/detectNet.h) is available to use from [Python](https://rawgit.com/dusty-nv/jetson-inference/master/docs/html/python/jetson.inference.html#detectNet) and [C++](../c/detectNet.h). See below for various [pre-trained detection models](#pre-trained-detection-models-available) available for download. The default model used is a [91-class](../data/networks/ssd_coco_labels.txt) SSD-Mobilenet-v2 model trained on the MS COCO dataset, which achieves realtime inferencing performance on Jetson with TensorRT.
As examples of using the `detectNet` class, we provide sample programs for C++ and Python:
- [`detectnet.cpp`](../examples/detectnet/detectnet.cpp) (C++)
- [`detectnet.py`](../python/examples/detectnet.py) (Python)
These samples are able to detect objects in images, videos, and camera feeds. For more info about the various types of input/output streams supported, see the [Camera Streaming and Multimedia](aux-streaming.md) page.
### Detecting Objects from Images
Yes, all objects are in the same frame and thank you for the links!