Basic Training - hitting dreaded 'thing not work'

Hi All,

I’ve been stuck on a likely very basic problem for a few weeks. My goal is to track a specific pink ball in a swimming pool, then control a series of 8 garden hose ‘jets’ based on some game rules (hot potato, etc).

I’ve been trying to train a model w/pytorch as per dusty-nv (GitHub - dusty-nv/jetson-inference: Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.)

Despite the consistency of the training images & actual stream, I can never seem to get beyond 25% confidence. In addition, I frequently see 2-5 balls detected in the immediate area of the one ball. So - clearly I’m doing something wrong.

The quality of the images are not ideal. I’m using 1280x720 images (could use 1920, but found lower=better?), but a ball is maybe 40x40 pixels…

I’ve manually tagged almost 1000 images of balls using, and trained to as high as 500 epochs without substantially different outcomes.

Example image:

Basic questions:

A) Correct model to start from (ssd-mobilenet?)
B) Resolution of training images relative to end input stream
C) ‘Tightness’ or precision of bounding box in training images
D) Color - I intentionally selected a relatively distinctly colored ball - I believe this is used by the models, but don’t know for sure?
E) Partially obstructed objects in training set - helpful, or harmful?
F) Presence of reflections in training set (calm water problem) - helpful, or harmful? Bound in bbox, or exclude?

Thanks in advance for any guidance!!

If these bounding boxes are overlapping, you might want to try decreasing the clustering threshold here:

This is the amount of overlap required (in percentage of area) before bounding boxes are clustered. Currently it is set to 75%. After you change it, re-run the following for the change to take effect:

cd jetson-inference/build
sudo make install

The images automatically get downsampled to 300x300 before being fed into the SSD-Mobilenet model, so further increasing the source resolutions probably isn’t necessary. Since the object you are trying to detect is on the smaller side, you may want to try training SSD-Mobilenet at 512x512 resolution instead of 300x300. For that, I have a special branch of pytorch-ssd which you can find in this post:

1 Like

Very helpful - The merging has worked perfectly.

For the 512x512 model, do I need a different base model than mobilenet-v1-ssd-mp-0_675.pth, or will the incremental training be at 512x512? (I’m assuming mobilenet-v1-ssd-mp-0_675.pth is 300x300)