I run six video streams in two locations, recorded by ffmpeg running on Raspberry PIs (four ffmpeg instances on one machine, two on another).
I use the camera’s event detection to send snapshot images to a Jetson Nano running Yolo V3. Yolo scans each 1920x1280 image in ~1 second. Probabilities above a threshold generate alerts, including the static image and a link in to the video archive.
So, with this configuration a single Jetson Nano can easily keep up with six or more cameras. And the modular architecture makes it easy to add or remove video- or image-servers as needed.
Are you sending exactly 1 picture from the Raspberry PI to the Jetson Nano? And what’s the point of sending 1 picture, because a person/dog/cat may appear a little later, for example, after 1 second. For example, I open the door, the camera captures the movement “opening the door” and only after 1 second I appear as a human being.
If the image is sent from 6 cameras at the same time, then the Jetson Nano will process it for 6 seconds, right?
The cameras have motion detection within their firmware. Mine are all Dahua but I pretty sure that Hikvision and others do the same.
The snapshots go directly from the cameras to the Nano, not via the Raspberry PIs. And, yes, I do send more than one snapshot, for the reasons you describe. There is a minimum delay between snapshots, set in the cameras. I use 5 seconds in most cases.
I have a maximum of two cameras covering the same physical space, so it’s unlikely that I will have more than one snapshot per second (a zombie invasion, approaching the house on all sides, would overload the Nano… but by that time notifications would be too little, too late).
If the ~1 second processing time is too long it could easily be reduced, by using a different Yolo model, by changing Yolo’s parameters or by reducing the snapshot size. But my experience is that Yolo generates MUCH more accurate results with the large image and full model, so I’m happy with the ~1 sec processsing time.