Having trouble setting up pytorch code for training ssd-mobilenet

francis.tse · September 1, 2021, 4:07pm

I am running the latest JetPack 4.6 on a Jetson Nano 2 GB and had been following the examples on the jetson-inference projects. Since I wanted to understand the details of how things are set up, I’ve chosen to build the project from source following instructions on this page.

Everything went well until I got to the page Re-training SSD-Mobilenet.

Since I am not running from the Docker Container, I tried to follow the Setup steps to get things ready. However, when I ran the wget command, I got an error that “models” directory does not exist. I created that and it ran. A file “mobilenet-v1-ssd-mp-0_675.pth” was downloaded to the “models” directory. But, when I ran the “pip3 install -v -r requirements.txt”, I got an error that “requirements.txt” not found. I found a “requirements.txt” file in GitHub - dusty-nv/pytorch-ssd: MobileNetV1, MobileNetV2, VGG based SSD/SSD-lite implementation in PyTorch. Out-of-box support for retraining on Open Images dataset. ONNX and Caffe2 support. Experiment Ideas like CoordConv. and copied that to the ssd directory. Now, the pip3 install ran. However, after a long process, I got the following (just showing the last output from the terminal

 ...
  adding 'pandas-1.1.5.dist-info/metadata.json'
  adding 'pandas-1.1.5.dist-info/top_level.txt'
  adding 'pandas-1.1.5.dist-info/WHEEL'
  adding 'pandas-1.1.5.dist-info/METADATA'
  adding 'pandas-1.1.5.dist-info/RECORD'
done
  Stored in directory: /home/francis/.cache/pip/wheels/04/a1/90/bc4b4417affaf986b84b41f0033243244024078fc71ff9be12
  Removing source in /tmp/pip-build-fziaxway/pandas
Successfully built pandas
Installing collected packages: jmespath, six, python-dateutil, urllib3, botocore, s3transfer, boto3, numpy, pytz, pandas

  changing mode of /home/francis/.local/bin/f2py to 775
  changing mode of /home/francis/.local/bin/f2py3 to 775
  changing mode of /home/francis/.local/bin/f2py3.6 to 775

Successfully installed boto3-1.18.33 botocore-1.21.33 jmespath-0.10.0 numpy-1.19.5 pandas-1.1.5 python-dateutil-2.8.2 pytz-2021.1 s3transfer-0.5.0 six-1.16.0 urllib3-1.26.6
Cleaning up...

However, there were no python scripts created in the ssd directory. So I think something is still not right.

Have anyone else tried this? Is there something else I should be doing?

dusty_nv · September 15, 2021, 4:25pm

Hi @francis.tse, it sounds like the submodules didn’t get properly checked out when you cloned the repo. Did you clone it with --recursive flag? I would suggest starting fresh and trying again like this:

git clone --recursive https://github.com/dusty-nv/jetson-inference

You can then check if all of the submodules and subdirectories are there (e.g. under jetson-inference/python/training)

Note that this time around you can skip the installation of PyTorch and requirements.txt, because presumably you already did that. If you still have problems, I would recommend just using the pre-built docker container.

francis.tse · September 16, 2021, 2:37am

Rather than cloning the repo again, I copied all the files and folders from GitHub - dusty-nv/pytorch-ssd: MobileNetV1, MobileNetV2, VGG based SSD/SSD-lite implementation in PyTorch. Out-of-box support for retraining on Open Images dataset. ONNX and Caffe2 support. Experiment Ideas like CoordConv. into my ssd folder.

Following the instructions, I used the camera-capture tool to collect and labeled the captured images. I created the labels.txt file and moved thing around to have a data folder with the following structure

Annotations folder (contains the *.xml files)
ImageSets folder
- Main folder (contains test.txt, train.txt and val.txt files)
JPEGImages folder (contains the *.jpg files)
labels.txt file (I created with the two names of objects to train model)

However, when I try to train the model, I got an error

francis@nano4:~/jetson-inference/python/training/detection/ssd$ python3 train_ssd.py --data=data/JPEGImages --model-dir=models/MaFran --batch-size=4 --epochs=2 --dataset-type=voc
2021-09-15 16:33:56 - Using CUDA...
2021-09-15 16:33:56 - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=4, checkpoint_folder='models/MaFran', dataset_type='voc', datasets=['data/JPEGImages'], debug_steps=10, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, lr=0.01, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb1-ssd', num_epochs=2, num_workers=2, pretrained_ssd='models/mobilenet-v1-ssd-mp-0_675.pth', resume=None, scheduler='cosine', t_max=100, use_cuda=True, validation_epochs=1, weight_decay=0.0005)
2021-09-15 16:33:57 - Prepare training datasets.
Traceback (most recent call last):
  File "train_ssd.py", line 214, in <module>
    target_transform=target_transform)
  File "/home/francis/jetson-inference/python/training/detection/ssd/vision/datasets/voc_dataset.py", line 33, in __init__
    raise IOError("missing ImageSet file {:s}".format(image_sets_file))
TypeError: unsupported format string passed to PosixPath.__format__

When I googled the error, I saw forum discussions about the need to conform to VOC dataset structure. I thought I am following the structure. To verify I have the correct VOC Dataset structure, I followed @dusty_nv’s recommendation and downloaded the VOC2012 dataset from the link http://host.robots.ox.ac.uk:8080/eval/downloads/.

To my surprise, the VOC2012 dataset has the following structure:

JPEGImages folder (contains the *.jpg files)
ImageSets folder
- Segmentation folder (contains the test.txt file)
- Main folder (contains test.txt plus 20 more *.txt files with names like tvmonitor_test.txt, train_test.txt, sofa_test.txt, etc.)
- Layout folder (contains test.txt file)
- Action folder (contains test.txt plus 10 more *.txt files with names like walking_test.txt, usingcomputer_test.txt, takingphoto_test.txt, etc.)
Annotations folder ( contains the *.xml files)

There are many more sub-folders in the ImageSets folder. Now, I am confused about what the VOC Dataset structure needs to be.

Perhaps the VOC Dataset structure depends on what model one is trying to train?

My intention is to see if I can train a model to recognize the face of two different person. So, my images consists of 15 pictures captured for one or the other or both persons. I checked the 15 corresponding *.xml files to make sure they have correct information in them. The *.txt files in the ImageSets/Main folder all have the same list of the image file names without the .jpg extension. The labels.txt just have the two person’s names. I was hoping to keep things simple to learn the process.

Any ideas of what to try next?

francis.tse · September 16, 2021, 2:44am

Something went wrong in my previous post. I copied all the files and folders from https://github.com/dusty-nv/pytorch-ssd without the long string after it.

francis.tse · September 21, 2021, 6:45pm

Seems like I found what the problem is. I have been providing a wrong path for the dataset.

I’ve followed @dusty_nv’s instruction to create the VOC Dataset structure, but I ended up putting the whole collection of folders and files in the ssd/data folder and set --data=data/JPEGImages in the training.

What I finally did was to create a MaFran folder in the data folder and put the VOC dataset into it and set --data=data/MaFran for training. Now, it ran without an error. Seems obvious in hind-sight. Especially, after I went back to do the “Re-training on the PlantCLEF Dataset” example that I skipped, and, went back to read the instructions more carefully this time. Anyhow, thought it would be good to document what I ended up with that worked.

This is the complete structure I had in the ssd folder

ssd folder
- data folder
  - MaFran folder
    - Annotations folder (contains the *.xml files generated by the camera-capture tool)
    - ImageSets folder
      - Main folder
        
        test.txt file (contains the name of the image files without the .jpg extension)
        
        train.txt file (contains the name of the image files without the .jpg extension)
        
        trainval.txt file (contains the name of the image files without the .jpg extension)
        
        val.txt file (empty)
    - JPEGImages folder (contains all the *.jpg files generated by the camera-capture tool)
    - labels.txt file (contains just the two names of the people)
- models folder
  - MaFran folder (started out empty)
  - mobilenet-v1-ssd-mp-0_675.pth file

To run the training, I did the following

francis@nano4:~/jetson-inference/python/training/detection/ssd$ python3 train_ssd.py --dataset-type=voc --data=data/MaFran --model-dir=models/MaFran --batch-size=4 --num-epochs=30

After the training ran successfully, these files were created in the models/MaFran folder - a labels.txt file and 30 *.pth files (one for each epoch).

Then, I did the conversion of the PyTorch model to ONNX by running

python3 onnx_export.py --model-dir=models/MaFran

This generated the ssd-mobilenet.onnx file in the models/MFran folder.

I was then able to do the detection from my webcam by running the following

detectnet --model=models/MaFran/ssd-mobilenet.onnx --labels=models/MaFran/labels.txt --input-blob=input_0 --output-cvg=scores --output-bbox=boxes /dev/video0

Hope this is helpful for others in the future.

dusty_nv · September 21, 2021, 8:48pm

Hi @francis.tse, OK great - that makes sense. Glad you were able to get it working!

francis.tse · September 22, 2021, 4:09pm

@dusty_nv , thanks for your great writeups and help.

Topic		Replies	Views
Jetson nano start the Docker an error occurred while training your detection model ：Segmentation fault (core dumped) Jetson Nano jetson-inference	7	1234	April 21, 2022
No such file or directory: 'data/stapler/sub-train-annotations-bbox.csv' Jetson Nano ai-training	8	1085	February 21, 2023
Can't train the SSD Mobilenet using Jetson Nano and Custom Dataset Jetson Nano tensorrt , jetson-inference , nano	2	1211	April 5, 2022
Train_ssd.py - Could not find image warning Jetson Orin Nano jetson-inference	7	114	August 13, 2024
Pickle error when training SSD MobileNet Jetson Nano jetson-inference	4	781	August 2, 2023
Train_ssd.py indices error Jetson Nano jetson-inference	12	1720	December 15, 2021
Train custom object detectio model Jetson Nano ai-training	12	3028	October 18, 2021
Jetson-inference: cannot train model with custom data set Jetson Nano jetson-inference	11	1959	March 9, 2022
Problem with mobilenet-v1-ssd-mp-0_675.pth when re-training SSD-MOBILENET Jetson Nano tensorrt , cuda , jetson-inference , python	2	1465	March 3, 2022
Cannot retrain SSD Mobilenet with Custom Dataset Jetson Nano ai-training	4	957	June 21, 2023

Having trouble setting up pytorch code for training ssd-mobilenet

Related topics