Having trouble setting up pytorch code for training ssd-mobilenet

I am running the latest JetPack 4.6 on a Jetson Nano 2 GB and had been following the examples on the jetson-inference projects. Since I wanted to understand the details of how things are set up, I’ve chosen to build the project from source following instructions on this page.

Everything went well until I got to the page Re-training SSD-Mobilenet.

Since I am not running from the Docker Container, I tried to follow the Setup steps to get things ready. However, when I ran the wget command, I got an error that “models” directory does not exist. I created that and it ran. A file “mobilenet-v1-ssd-mp-0_675.pth” was downloaded to the “models” directory. But, when I ran the “pip3 install -v -r requirements.txt”, I got an error that “requirements.txt” not found. I found a “requirements.txt” file in GitHub - dusty-nv/pytorch-ssd: MobileNetV1, MobileNetV2, VGG based SSD/SSD-lite implementation in PyTorch. Out-of-box support for retraining on Open Images dataset. ONNX and Caffe2 support. Experiment Ideas like CoordConv. and copied that to the ssd directory. Now, the pip3 install ran. However, after a long process, I got the following (just showing the last output from the terminal

 ...
  adding 'pandas-1.1.5.dist-info/metadata.json'
  adding 'pandas-1.1.5.dist-info/top_level.txt'
  adding 'pandas-1.1.5.dist-info/WHEEL'
  adding 'pandas-1.1.5.dist-info/METADATA'
  adding 'pandas-1.1.5.dist-info/RECORD'
done
  Stored in directory: /home/francis/.cache/pip/wheels/04/a1/90/bc4b4417affaf986b84b41f0033243244024078fc71ff9be12
  Removing source in /tmp/pip-build-fziaxway/pandas
Successfully built pandas
Installing collected packages: jmespath, six, python-dateutil, urllib3, botocore, s3transfer, boto3, numpy, pytz, pandas

  changing mode of /home/francis/.local/bin/f2py to 775
  changing mode of /home/francis/.local/bin/f2py3 to 775
  changing mode of /home/francis/.local/bin/f2py3.6 to 775

Successfully installed boto3-1.18.33 botocore-1.21.33 jmespath-0.10.0 numpy-1.19.5 pandas-1.1.5 python-dateutil-2.8.2 pytz-2021.1 s3transfer-0.5.0 six-1.16.0 urllib3-1.26.6
Cleaning up...

However, there were no python scripts created in the ssd directory. So I think something is still not right.

Have anyone else tried this? Is there something else I should be doing?

Hi @francis.tse, it sounds like the submodules didn’t get properly checked out when you cloned the repo. Did you clone it with --recursive flag? I would suggest starting fresh and trying again like this:

git clone --recursive https://github.com/dusty-nv/jetson-inference

You can then check if all of the submodules and subdirectories are there (e.g. under jetson-inference/python/training)

Note that this time around you can skip the installation of PyTorch and requirements.txt, because presumably you already did that. If you still have problems, I would recommend just using the pre-built docker container.

Rather than cloning the repo again, I copied all the files and folders from GitHub - dusty-nv/pytorch-ssd: MobileNetV1, MobileNetV2, VGG based SSD/SSD-lite implementation in PyTorch. Out-of-box support for retraining on Open Images dataset. ONNX and Caffe2 support. Experiment Ideas like CoordConv. into my ssd folder.

Following the instructions, I used the camera-capture tool to collect and labeled the captured images. I created the labels.txt file and moved thing around to have a data folder with the following structure

  • Annotations folder (contains the *.xml files)
  • ImageSets folder
    • Main folder (contains test.txt, train.txt and val.txt files)
  • JPEGImages folder (contains the *.jpg files)
  • labels.txt file (I created with the two names of objects to train model)

However, when I try to train the model, I got an error

francis@nano4:~/jetson-inference/python/training/detection/ssd$ python3 train_ssd.py --data=data/JPEGImages --model-dir=models/MaFran --batch-size=4 --epochs=2 --dataset-type=voc
2021-09-15 16:33:56 - Using CUDA...
2021-09-15 16:33:56 - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=4, checkpoint_folder='models/MaFran', dataset_type='voc', datasets=['data/JPEGImages'], debug_steps=10, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, lr=0.01, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb1-ssd', num_epochs=2, num_workers=2, pretrained_ssd='models/mobilenet-v1-ssd-mp-0_675.pth', resume=None, scheduler='cosine', t_max=100, use_cuda=True, validation_epochs=1, weight_decay=0.0005)
2021-09-15 16:33:57 - Prepare training datasets.
Traceback (most recent call last):
  File "train_ssd.py", line 214, in <module>
    target_transform=target_transform)
  File "/home/francis/jetson-inference/python/training/detection/ssd/vision/datasets/voc_dataset.py", line 33, in __init__
    raise IOError("missing ImageSet file {:s}".format(image_sets_file))
TypeError: unsupported format string passed to PosixPath.__format__

When I googled the error, I saw forum discussions about the need to conform to VOC dataset structure. I thought I am following the structure. To verify I have the correct VOC Dataset structure, I followed @dusty_nv’s recommendation and downloaded the VOC2012 dataset from the link http://host.robots.ox.ac.uk:8080/eval/downloads/.

To my surprise, the VOC2012 dataset has the following structure:

  • JPEGImages folder (contains the *.jpg files)
  • ImageSets folder
    • Segmentation folder (contains the test.txt file)
    • Main folder (contains test.txt plus 20 more *.txt files with names like tvmonitor_test.txt, train_test.txt, sofa_test.txt, etc.)
    • Layout folder (contains test.txt file)
    • Action folder (contains test.txt plus 10 more *.txt files with names like walking_test.txt, usingcomputer_test.txt, takingphoto_test.txt, etc.)
  • Annotations folder ( contains the *.xml files)

There are many more sub-folders in the ImageSets folder. Now, I am confused about what the VOC Dataset structure needs to be.

Perhaps the VOC Dataset structure depends on what model one is trying to train?

My intention is to see if I can train a model to recognize the face of two different person. So, my images consists of 15 pictures captured for one or the other or both persons. I checked the 15 corresponding *.xml files to make sure they have correct information in them. The *.txt files in the ImageSets/Main folder all have the same list of the image file names without the .jpg extension. The labels.txt just have the two person’s names. I was hoping to keep things simple to learn the process.

Any ideas of what to try next?

Something went wrong in my previous post. I copied all the files and folders from https://github.com/dusty-nv/pytorch-ssd without the long string after it.

Seems like I found what the problem is. I have been providing a wrong path for the dataset.

I’ve followed @dusty_nv’s instruction to create the VOC Dataset structure, but I ended up putting the whole collection of folders and files in the ssd/data folder and set --data=data/JPEGImages in the training.

What I finally did was to create a MaFran folder in the data folder and put the VOC dataset into it and set --data=data/MaFran for training. Now, it ran without an error. Seems obvious in hind-sight. Especially, after I went back to do the “Re-training on the PlantCLEF Dataset” example that I skipped, and, went back to read the instructions more carefully this time. Anyhow, thought it would be good to document what I ended up with that worked.

This is the complete structure I had in the ssd folder

  • ssd folder
    • data folder
      • MaFran folder
        • Annotations folder (contains the *.xml files generated by the camera-capture tool)
        • ImageSets folder
          • Main folder
            • test.txt file (contains the name of the image files without the .jpg extension)
            • train.txt file (contains the name of the image files without the .jpg extension)
            • trainval.txt file (contains the name of the image files without the .jpg extension)
            • val.txt file (empty)
        • JPEGImages folder (contains all the *.jpg files generated by the camera-capture tool)
        • labels.txt file (contains just the two names of the people)
    • models folder
      • MaFran folder (started out empty)
      • mobilenet-v1-ssd-mp-0_675.pth file

To run the training, I did the following

francis@nano4:~/jetson-inference/python/training/detection/ssd$ python3 train_ssd.py --dataset-type=voc --data=data/MaFran --model-dir=models/MaFran --batch-size=4 --num-epochs=30

After the training ran successfully, these files were created in the models/MaFran folder - a labels.txt file and 30 *.pth files (one for each epoch).

Then, I did the conversion of the PyTorch model to ONNX by running

python3 onnx_export.py --model-dir=models/MaFran

This generated the ssd-mobilenet.onnx file in the models/MFran folder.

I was then able to do the detection from my webcam by running the following

detectnet --model=models/MaFran/ssd-mobilenet.onnx --labels=models/MaFran/labels.txt --input-blob=input_0 --output-cvg=scores --output-bbox=boxes /dev/video0

Hope this is helpful for others in the future.

Hi @francis.tse, OK great - that makes sense. Glad you were able to get it working!

@dusty_nv , thanks for your great writeups and help.