I am trying to build an object recognition system for a retail store. I have built a network using Yolo algorithm and Resnet backbone and I can successfully identify products in a frame. However, I want to also identify and count products when they are stacked either vertically or when placed behind one another.
So as we can see in the images, the system detects kettle chips and too yumm chips when placed together, but two packets of too yumm chips/buttermilk/coke are identified as a single product. How can I differentiate these as two separate products along the edge? Basically I want the model to output two separate bounding boxes even when they are placed close to each other.