Idea about a Multi-detection tool

Hi, this is my first post in this forum and I would like to ask you for an opinion on one of my ideas. (And I’m not the best in English, so please forgive any mistakes :) )

So, the project, which I am already planning a bit is about recognizing gestures so that I can control my room, maybe later my house.

I try to set myself several smaller tasks which are also solvable in the foreseeable future.
So I would start with a small person recognition which recognizes persons and sends this picture to the next program. This program should then simply cut out the person, convert it into a standard format and send it on.
From there a hand recognition should be done. If a hand is recognized, we go on to the next step, otherwise we wait for the next image/frame. Here the hand should be cut out again (into a rectangle format). So that the next program can recognize a gesture more easily.
On the basis of the recognized gesture any information can be executed…

The pictures should therefore always be cut so that I can save myself a bit of performance with the networks. Furthermore I hope to increase the image recognition rate.

To my question:
Is this possible in this way as I imagine it or would I need more computing power for it? (Currently I own a Jetson Nano)

Picture about the process flow chart:


The computing resource is related to the model complexity you use.
So it’s recommended to give it a try directly.

Here are two related tutorial for your reference:

1. Multiple detector with Deepstream:

2. Gesture recognition on Jetson:


1 Like