Pickle error when training SSD MobileNet

Hello,
I was trying to follow this tutorial,

and have a strange pickle error, the file does not appear to be empty

trinhjames@trinhjames-desktop:~/jetson-inference/python/training/detection/ssd$ python3 train_ssd.py --dataset-type=voc --data=data/door --model-dir=model/new
/home/trinhjames/.local/lib/python3.7/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension:
warn(f"Failed to load image Python extension: {e}")
2023-07-13 13:56:47 - Namespace(balance_data=False, base_net=None, base_net_lr=0.001, batch_size=4, checkpoint_folder=‘model/new’, dataset_type=‘voc’, datasets=[‘data/door’], debug_steps=10, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, log_level=‘info’, lr=0.01, mb2_width_mult=1.0, milestones=‘80,100’, momentum=0.9, net=‘mb1-ssd’, num_epochs=30, num_workers=2, pretrained_ssd=‘models/mobilenet-v1-ssd-mp-0_675.pth’, resolution=300, resume=None, scheduler=‘cosine’, t_max=100, use_cuda=True, validation_epochs=1, validation_mean_ap=False, weight_decay=0.0005)
2023-07-13 13:56:48 - model resolution 300x300
2023-07-13 13:56:48 - SSDSpec(feature_map_size=19, shrinkage=16, box_sizes=SSDBoxSizes(min=60, max=105), aspect_ratios=[2, 3])
2023-07-13 13:56:48 - SSDSpec(feature_map_size=10, shrinkage=32, box_sizes=SSDBoxSizes(min=105, max=150), aspect_ratios=[2, 3])
2023-07-13 13:56:48 - SSDSpec(feature_map_size=5, shrinkage=64, box_sizes=SSDBoxSizes(min=150, max=195), aspect_ratios=[2, 3])
2023-07-13 13:56:48 - SSDSpec(feature_map_size=3, shrinkage=100, box_sizes=SSDBoxSizes(min=195, max=240), aspect_ratios=[2, 3])
2023-07-13 13:56:48 - SSDSpec(feature_map_size=2, shrinkage=150, box_sizes=SSDBoxSizes(min=240, max=285), aspect_ratios=[2, 3])
2023-07-13 13:56:48 - SSDSpec(feature_map_size=1, shrinkage=300, box_sizes=SSDBoxSizes(min=285, max=330), aspect_ratios=[2, 3])
2023-07-13 13:56:48 - Prepare training datasets.
2023-07-13 13:56:48 - VOC Labels read from file: (‘BACKGROUND’, ‘First’, ‘Second’)
2023-07-13 13:56:48 - Stored labels into file model/new/labels.txt.
2023-07-13 13:56:48 - Train dataset size: 20
2023-07-13 13:56:48 - Prepare Validation datasets.
2023-07-13 13:56:48 - VOC Labels read from file: (‘BACKGROUND’, ‘First’, ‘Second’)
2023-07-13 13:56:48 - Validation dataset size: 5
2023-07-13 13:56:48 - Build network.
2023-07-13 13:56:48 - Init from pretrained SSD models/mobilenet-v1-ssd-mp-0_675.pth
Traceback (most recent call last):
File “train_ssd.py”, line 371, in
net.init_from_pretrained_ssd(args.pretrained_ssd)
File “/home/trinhjames/jetson-inference/python/training/detection/ssd/vision/ssd/ssd.py”, line 133, in init_from_pretrained_ssd
state_dict = torch.load(model, map_location=lambda storage, loc: storage)
File “/home/trinhjames/.local/lib/python3.7/site-packages/torch/serialization.py”, line 795, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File “/home/trinhjames/.local/lib/python3.7/site-packages/torch/serialization.py”, line 1002, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input

Any help is appreciated, thanks for the in depth tutorials.

Hi @james0.0joseph, my bet is that your download of mobilenet-v1-ssd-mp-0_675.pth was incomplete/corrupted, so try downloading it again:

cd ~/jetson-inference/python/training/detection/ssd
wget https://nvidia.box.com/shared/static/djf5w54rjvpqocsiztzaandq1m3avr7c.pth -O models/mobilenet-v1-ssd-mp-0_675.pth

Or you can also download it manually from Google Drive from here: https://github.com/qfgaohao/pytorch-ssd#download-models

I noticed that you are using Python 3.7 - did you rebuild/install PyTorch for Python 3.7 with CUDA enabled? You can run python3.7 -c 'import torch; print(torch.cuda.is_available())' to confirm.

I do not have cuda enabled, I originally had python 3.6 and followed the Transfer Learning tutorial but have since upgraded to 3.7 for a servo motor controller to work. For 3.7, I did a pip install statement off of the Pytorch website,
pip3 install torch torchvision

because the Transfer Learning tutorial would error. This worked for using ssd-mobilenet inside opencv. I have tried to rebuild pytorch from source just now but cuda.is_available() is still false. Is there a way to enable it in 3.7 or do you think I need python 3.8 to do so?
Thanks,
James

Yes, other people on the forums have re-built PyTorch for Python 3.7 with CUDA enabled (although doing so for Python 3.8 seems more common). There are general compiling instructions for PyTorch under the Build from Source section of this post:

You will probably want to explicitly call python3.7 and pip3.7 instead of python3 and pip3 in those commands, though. Also, soon after the build begins, PyTorch will print out it’s build configuration, will let you know if it detected/enabled CUDA in the build or not.

Presumably your servo motor controller is used during runtime - during training, perhaps you could still just use Python 3.6. You could use the jetson-inference docker container for training, it already has PyTorch/ect installed in it.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.