Pickle error when training SSD MobileNet

james0.0joseph · July 13, 2023, 6:01pm

Hello,
I was trying to follow this tutorial,

dusty-nv/jetson-inference/blob/master/docs/pytorch-collect-detection.md

<img src="https://github.com/dusty-nv/jetson-inference/raw/master/docs/images/deep-vision-header.jpg" width="100%">
<p align="right"><sup><a href="pytorch-plants.md">Back</a> | <a href="../README.md#webapp-frameworks">Next</a> | </sup><a href="../README.md#hello-ai-world"><sup>Contents</sup></a>
<br/>
<sup>Transfer Learning - Object Detection</sup></s></p>

# Collecting your own Detection Datasets

The previously used `camera-capture` tool can also label object detection datasets from live video:

<img src="https://github.com/dusty-nv/jetson-inference/raw/master/docs/images/pytorch-collection-detect.jpg" >

When the `Dataset Type` drop-down is in Detection mode, the tool creates datasets in [Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) format (which is supported during training).

> **note:** if you want to label a set of images that you already have (as opposed to capturing them from camera), try using a tool like [`CVAT`](https://github.com/openvinotoolkit/cvat) and export the dataset in Pascal VOC format.  Then create a labels.txt in the dataset with the names of each of your object classes.

## Creating the Label File

Under `jetson-inference/python/training/detection/ssd/data`, create an empty directory for storing your dataset and a text file that will define the class labels (usually called `labels.txt`).  The label file contains one class label per line, for example:

``` bash

This file has been truncated. show original

and have a strange pickle error, the file does not appear to be empty

trinhjames@trinhjames-desktop:~/jetson-inference/python/training/detection/ssd$ python3 train_ssd.py --dataset-type=voc --data=data/door --model-dir=model/new
/home/trinhjames/.local/lib/python3.7/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension:
warn(f"Failed to load image Python extension: {e}")
2023-07-13 13:56:47 - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=4, checkpoint_folder=‘model/new’, dataset_type=‘voc’, datasets=[‘data/door’], debug_steps=10, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, log_level=‘info’, lr=0.01, mb2_width_mult=1.0, milestones=‘80,100’, momentum=0.9, net=‘mb1-ssd’, num_epochs=30, num_workers=2, pretrained_ssd=‘models/mobilenet-v1-ssd-mp-0_675.pth’, resolution=300, resume=None, scheduler=‘cosine’, t_max=100, use_cuda=True, validation_epochs=1, validation_mean_ap=False, weight_decay=0.0005)
2023-07-13 13:56:48 - model resolution 300x300
2023-07-13 13:56:48 - SSDSpec(feature_map_size=19, shrinkage=16, box_sizes=SSDBoxSizes(min=60, max=105), aspect_ratios=[2, 3])
2023-07-13 13:56:48 - SSDSpec(feature_map_size=10, shrinkage=32, box_sizes=SSDBoxSizes(min=105, max=150), aspect_ratios=[2, 3])
2023-07-13 13:56:48 - SSDSpec(feature_map_size=5, shrinkage=64, box_sizes=SSDBoxSizes(min=150, max=195), aspect_ratios=[2, 3])
2023-07-13 13:56:48 - SSDSpec(feature_map_size=3, shrinkage=100, box_sizes=SSDBoxSizes(min=195, max=240), aspect_ratios=[2, 3])
2023-07-13 13:56:48 - SSDSpec(feature_map_size=2, shrinkage=150, box_sizes=SSDBoxSizes(min=240, max=285), aspect_ratios=[2, 3])
2023-07-13 13:56:48 - SSDSpec(feature_map_size=1, shrinkage=300, box_sizes=SSDBoxSizes(min=285, max=330), aspect_ratios=[2, 3])
2023-07-13 13:56:48 - Prepare training datasets.
2023-07-13 13:56:48 - VOC Labels read from file: (‘BACKGROUND’, ‘First’, ‘Second’)
2023-07-13 13:56:48 - Stored labels into file model/new/labels.txt.
2023-07-13 13:56:48 - Train dataset size: 20
2023-07-13 13:56:48 - Prepare Validation datasets.
2023-07-13 13:56:48 - VOC Labels read from file: (‘BACKGROUND’, ‘First’, ‘Second’)
2023-07-13 13:56:48 - Validation dataset size: 5
2023-07-13 13:56:48 - Build network.
2023-07-13 13:56:48 - Init from pretrained SSD models/mobilenet-v1-ssd-mp-0_675.pth
Traceback (most recent call last):
File “train_ssd.py”, line 371, in
net.init_from_pretrained_ssd(args.pretrained_ssd)
File “/home/trinhjames/jetson-inference/python/training/detection/ssd/vision/ssd/ssd.py”, line 133, in init_from_pretrained_ssd
state_dict = torch.load(model, map_location=lambda storage, loc: storage)
File “/home/trinhjames/.local/lib/python3.7/site-packages/torch/serialization.py”, line 795, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File “/home/trinhjames/.local/lib/python3.7/site-packages/torch/serialization.py”, line 1002, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input

Any help is appreciated, thanks for the in depth tutorials.

dusty_nv · July 13, 2023, 6:33pm

Hi @james0.0joseph, my bet is that your download of mobilenet-v1-ssd-mp-0_675.pth was incomplete/corrupted, so try downloading it again:

cd ~/jetson-inference/python/training/detection/ssd
wget https://nvidia.box.com/shared/static/djf5w54rjvpqocsiztzaandq1m3avr7c.pth -O models/mobilenet-v1-ssd-mp-0_675.pth

Or you can also download it manually from Google Drive from here: https://github.com/qfgaohao/pytorch-ssd#download-models

I noticed that you are using Python 3.7 - did you rebuild/install PyTorch for Python 3.7 with CUDA enabled? You can run python3.7 -c 'import torch; print(torch.cuda.is_available())' to confirm.

james0.0joseph · July 18, 2023, 4:03pm

I do not have cuda enabled, I originally had python 3.6 and followed the Transfer Learning tutorial but have since upgraded to 3.7 for a servo motor controller to work. For 3.7, I did a pip install statement off of the Pytorch website,
pip3 install torch torchvision

because the Transfer Learning tutorial would error. This worked for using ssd-mobilenet inside opencv. I have tried to rebuild pytorch from source just now but cuda.is_available() is still false. Is there a way to enable it in 3.7 or do you think I need python 3.8 to do so?
Thanks,
James

dusty_nv · July 18, 2023, 5:04pm

Yes, other people on the forums have re-built PyTorch for Python 3.7 with CUDA enabled (although doing so for Python 3.8 seems more common). There are general compiling instructions for PyTorch under the Build from Source section of this post:

You will probably want to explicitly call python3.7 and pip3.7 instead of python3 and pip3 in those commands, though. Also, soon after the build begins, PyTorch will print out it’s build configuration, will let you know if it detected/enabled CUDA in the build or not.

Presumably your servo motor controller is used during runtime - during training, perhaps you could still just use Python 3.6. You could use the jetson-inference docker container for training, it already has PyTorch/ect installed in it.

system · August 2, 2023, 3:00am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Jetson inference Hello AI world detectnet train_ssd.py error Jetson AGX Orin jetson-inference	4	643	August 30, 2022
Jetson nano start the Docker an error occurred while training your detection model ：Segmentation fault (core dumped) Jetson Nano jetson-inference	7	1234	April 21, 2022
Jetson nano - train model for my own object detection Jetson Nano ai-training	11	4461	October 15, 2021
Re-training SSD-Mobilenet: gt_locations consist of nan values which causing Regression Loss to NaN Jetson Nano ai-training	2	922	September 13, 2022
Pytorch support Jetson Nano	31	4558	October 18, 2021
Train_ssd.py indices error Jetson Nano jetson-inference	12	1721	December 15, 2021
Train custom object detectio model Jetson Nano ai-training	12	3036	October 18, 2021
Pickling error while training Jetson Nano jetson-inference	2	625	July 26, 2022
Having trouble setting up pytorch code for training ssd-mobilenet Jetson Nano jetson-inference , python , training	7	1897	October 15, 2021
Problems with train_ssd.py Jetson Nano	2	1018	October 14, 2021

Pickle error when training SSD MobileNet

Related topics