My goal is to detect cardboard boxes, but boxes come in all shapes and sizes are organized differently in each warehouse. If I only trained my model on just one environment of cardboard boxes - it will not generalize well.
Currently, I have training data images from 6 different warehouses that treat cardboard boxes differently, My goal is to generalize well, how many more environments should I include to achieve that generalization. More specifically, TLT uses sequences to describe a different video, how many of these sequences are generally used for good practices?