Hi All,
I’ve been stuck on a likely very basic problem for a few weeks. My goal is to track a specific pink ball in a swimming pool, then control a series of 8 garden hose ‘jets’ based on some game rules (hot potato, etc).
I’ve been trying to train a model w/pytorch as per dusty-nv (GitHub - dusty-nv/jetson-inference: Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.)
Despite the consistency of the training images & actual stream, I can never seem to get beyond 25% confidence. In addition, I frequently see 2-5 balls detected in the immediate area of the one ball. So - clearly I’m doing something wrong.
The quality of the images are not ideal. I’m using 1280x720 images (could use 1920, but found lower=better?), but a ball is maybe 40x40 pixels…
I’ve manually tagged almost 1000 images of balls using cvat.org, and trained to as high as 500 epochs without substantially different outcomes.
Example image:
Basic questions:
A) Correct model to start from (ssd-mobilenet?)
B) Resolution of training images relative to end input stream
C) ‘Tightness’ or precision of bounding box in training images
D) Color - I intentionally selected a relatively distinctly colored ball - I believe this is used by the models, but don’t know for sure?
E) Partially obstructed objects in training set - helpful, or harmful?
F) Presence of reflections in training set (calm water problem) - helpful, or harmful? Bound in bbox, or exclude?
Thanks in advance for any guidance!!