Dusty nv inference cat_dog classification problem

Hi,

I want to play with the dusty nv classification again, However, there is an error. It was fine couple years ago. Is it the pytorch version problem? Mine is 1.6. Jetpack is 4.6. May I know how to solve it?

Note: I have tried to git clone the newest dustynv Inference in another board, same error…

Thx

nvidia@nvidia-desktop:~/jetson-inference_2021/python/training/classification$ python3 train.py --model-dir=models/cat_dog data/cat_dog
Use GPU: 0 for training
=> dataset classes:  2 ['cat', 'dog']
=> using pre-trained model 'resnet18'
Traceback (most recent call last):
  File "train.py", line 506, in <module>
    main()
  File "train.py", line 135, in main
    main_worker(args.gpu, ngpus_per_node, args)
  File "train.py", line 199, in main_worker
    model = models.__dict__[args.arch](pretrained=True)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torchvision/models/resnet.py", line 277, in resnet18
    **kwargs)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torchvision/models/resnet.py", line 263, in _resnet
    progress=progress)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/hub.py", line 490, in load_state_dict_from_url
    raise RuntimeError('Only one file(not dir) is allowed in the zipfile')
RuntimeError: Only one file(not dir) is allowed in the zipfile

nvidia@nvidia-desktop:~/jetson-inference_2021/python/training/classification$ pip list|grep torch
torch                            1.6.0
torch2trt                        0.1.0
torchvision                      0.10.0

nvidia@nvidia-desktop:~/jetson-inference_2021/python/training/classification$ sudo apt-cache show nvidia-jetpack
Package: nvidia-jetpack
Version: 4.6-b199
Architecture: arm64
Maintainer: NVIDIA Corporation
Installed-Size: 194
Depends: nvidia-cuda (= 4.6-b199), nvidia-opencv (= 4.6-b199), nvidia-cudnn8 (= 4.6-b199), nvidia-tensorrt (= 4.6-b199), nvidia-visionworks (= 4.6-b199), nvidia-container (= 4.6-b199), nvidia-vpi (= 4.6-b199), nvidia-l4t-jetson-multimedia-api (>> 32.6-0), nvidia-l4t-jetson-multimedia-api (<< 32.7-0)

I have installed pytorch like this

$ ./install-pytorch.sh

Hi @AK51, I haven’t seen this error before, and wonder if somehow your torch model cache has a corrupt download in it. Can you try clearing it like this?

rm -rf ~/.cache/torch

Dear Dusty,

Still not working… I have tried two boards with working SD image (I burned them long time ago and object detection works. Never tried classification in them) Maybe the torch resnet18 has problem? Thx
https://download.pytorch.org/models/resnet18-f37072fd.pth”?

I saw this link but no solution…, Unable to load model state_dict using torch.utils.model_zoo.load_url() - PyTorch Forums
“Maybe that’s what I trained in Google Collab and I just noticed collab uses torch version1.6 where _use_new_zipfile_serialization is True by default”
Maybe “_use_new_zipfile_serialization” is not explicitly set in nano board? Thx

nvidia@nvidia-desktop:~$ cd ~/.cache/torch
nvidia@nvidia-desktop:~/.cache/torch$ ls
hub
nvidia@nvidia-desktop:~/.cache/torch$ cd hub
nvidia@nvidia-desktop:~/.cache/torch/hub$ ls
checkpoints
nvidia@nvidia-desktop:~/.cache/torch/hub$ cd checkpoints/
nvidia@nvidia-desktop:~/.cache/torch/hub/checkpoints$ ls
alexnet-owt-4df8aa71.pth  resnet18-5c106cde.pth  resnet18-f37072fd.pth
nvidia@nvidia-desktop:~/.cache/torch/hub/checkpoints$ rm resnet18-5c106cde.pth 
nvidia@nvidia-desktop:~/.cache/torch/hub/checkpoints$ rm resnet18-f37072fd.pth 
nvidia@nvidia-desktop:~/.cache/torch/hub/checkpoints$ rm alexnet-owt-4df8aa71.pth 
rm: remove write-protected regular file 'alexnet-owt-4df8aa71.pth'? y
nvidia@nvidia-desktop:~/.cache/torch/hub/checkpoints$ ls

etson-inference_2021/python/training/classification$ python3 train.py --model-dir=models/cat_dog data/cat_dog
Use GPU: 0 for training
=> dataset classes:  2 ['cat', 'dog']
=> using pre-trained model 'resnet18'
Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /home/nvidia/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████████████████████████████████| 44.7M/44.7M [00:03<00:00, 11.9MB/s]
Traceback (most recent call last):
  File "train.py", line 506, in <module>
    main()
  File "train.py", line 135, in main
    main_worker(args.gpu, ngpus_per_node, args)
  File "train.py", line 199, in main_worker
    model = models.__dict__[args.arch](pretrained=True)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torchvision/models/resnet.py", line 277, in resnet18
    **kwargs)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torchvision/models/resnet.py", line 263, in _resnet
    progress=progress)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/hub.py", line 490, in load_state_dict_from_url
    raise RuntimeError('Only one file(not dir) is allowed in the zipfile')
RuntimeError: Only one file(not dir) is allowed in the zipfile

Dear Dusty,

I tried vgg16 and it works. :>
Restnet and alexnet have problem.

nvidia@nvidia-desktop:~/jetson-inference_2021/python/training/classification$ python3 train.py --model-dir=models/cat_dog data/cat_dog --arch=vgg16
Use GPU: 0 for training
=> dataset classes:  2 ['cat', 'dog']
=> using pre-trained model 'vgg16'
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /home/nvidia/.cache/torch/hub/checkpoints/vgg16-397923af.pth
100%|█████████████████████████████████████████████████████████████████| 528M/528M [00:46<00:00, 11.8MB/s]
=> reshaped VGG classifier layer with: Linear(in_features=4096, out_features=2, bias=True)
Epoch: [0][  0/625]  Time 68.147 (68.147)  Data  1.212 ( 1.212)  Loss 6.3846e-01 (6.3846e-01)  Acc@1  75.00 ( 75.00)  Acc@5 100.00 (100.00)
Epoch: [0][ 10/625]  Time  1.076 ( 7.507)  Data  0.001 ( 0.157)  Loss nan (nan)  Acc@1  37.50 ( 45.45)  Acc@5 100.00 (100.00)

Dear Dusty,

I can use imagenet to do the inference using vgg16
But when I try imagenet.py

python3 imagenet.py --model=$NET/vgg16.onnx --input_blob=input_0 --output_blob=output_0 --labels=$DATASET/labels.txt \
           $DATASET/test/S $DATASET/test/S_out
  File "imagenet.py", line 68, in <module>
    class_id, confidence = net.Classify(img)
Exception: jetson.inference -- imageNet.Classify() encountered an error classifying the image

Dear Dusty,

googlenet works. :>

python3 imagenet.py --model=$NET/googlenet.onnx --input_blob=input_0 --output_blob=output_0 --labels=$DATASET/labels.txt \
           $DATASET/test/S $DATASET/test/S_out

Thanks for reporting this @AK51 - I cleared my torchhub model cache and retested this on JetPack 4.6.1, and it was able to download and load the resnet18.pth without issue. What version of PyTorch and torchvision are you using?

Hi,

I have finished my small project using googlenet, nevermind. :> Thx

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.