I’ve been stuck on a likely very basic problem for a few weeks. My goal is to track a specific pink ball in a swimming pool, then control a series of 8 garden hose ‘jets’ based on some game rules (hot potato, etc).
I’ve been trying to train a model w/pytorch as per dusty-nv (GitHub - dusty-nv/jetson-inference: Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.)
Despite the consistency of the training images & actual stream, I can never seem to get beyond 25% confidence. In addition, I frequently see 2-5 balls detected in the immediate area of the one ball. So - clearly I’m doing something wrong.
The quality of the images are not ideal. I’m using 1280x720 images (could use 1920, but found lower=better?), but a ball is maybe 40x40 pixels…
I’ve manually tagged almost 1000 images of balls using cvat.org, and trained to as high as 500 epochs without substantially different outcomes.
A) Correct model to start from (ssd-mobilenet?)
B) Resolution of training images relative to end input stream
C) ‘Tightness’ or precision of bounding box in training images
D) Color - I intentionally selected a relatively distinctly colored ball - I believe this is used by the models, but don’t know for sure?
E) Partially obstructed objects in training set - helpful, or harmful?
F) Presence of reflections in training set (calm water problem) - helpful, or harmful? Bound in bbox, or exclude?
Thanks in advance for any guidance!!