Thank you for the reply, it helps a lot!
Regarding the “biggest problem” maybe I should say more about the solution why I am looking for 4 cameras: the idea is to identify different objects cca. 1000 within a given cube, but not necessarily at the same time… for example, I am worried that the bigger object would cover ( view ) the object behind , etc. Hence I am looking to see it also from the other sides, however, each side’s view can be blocked due to the bigger object in front, I guess the only solution then is to have a camera at the top of the cube-birds eye view, this will be addressed later if necessary, I guess nano can handle even more cameras… btw what is typical solution for such cases?
With regard to the inference, was thinking to analyse the input from each camera, and try to find the difference between previous frames (btw analyse the whole space that camera can see)… running multiple individual ann, each for each camera view and then continue with the logic … The logic could, for example, follow : if an “x object” is found on the “right side” and the “x object” is also found on the “front side”, while on the left and back side we have a different detection, namely we can not find the complete shape of the object at hand, maybe because it is too big or the surface does not mach the “x object”, etc , than we can infer that “x object” is found, and so on. Would you think this is the right way to think about that?
Anyway, I would be more than happy to tackle this with other working solutions, or study the examples how this was tackled before… not necessarily reinventing the wheel.
Looking back, the idea is to identify different objects within a given cube space - hence I am looking for 4 cameras at each of the cube’s horizontal side. However, is there a way to map what cameras capture and create a smooth 3D look around-something like stereo vision, this rises the question about cameras with the depth of field (eg. ZED mini), but within a cube, map the view… for example. In addition, what combination of ANN, CNN, etc. would be useful here, not just for the object detection, but to stitch the view together and then go on with the inference?
What do you think…
Best.