Cat/dog training from Hello AI World hangs on Orin?

pete26 · July 13, 2022, 10:39am

Hi,

I’m upgrading from Nano to Orin. Initially all I want to do is run through the Hello AI World tutorial before doing some train.py model re-training on my own data.

Everything works - inference on oranges, strawberries etc. but the cat/dog re-training appears to start but then does a strange .pth download that I never saw on the Nano and then hangs until I break out with ctl-c:

Use GPU: 0 for training
=> dataset classes: 2 [‘cat’, ‘dog’]
=> using pre-trained model ‘resnet18’
Downloading: “https://download.pytorch.org/models/resnet18-f37072fd.pth” to /home/pd/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth

I’ve tried googlenet rather than resnet18 and it’s the same. I’ve tried both the build from source and container and it’s the same.

Any ideas?

Thanks,

Pete

pete26 · July 13, 2022, 11:04am

Right - panic over - the mere writing of the post fixed it. I left the re-training running whilst I wrote the post and now it’s started running epochs! Maybe 20-30 minutes after I launched it. No idea what’s going on with that - maybe some initial setup thing I guess? Anyway - hopefully fixed - sorry to bother you!

dusty_nv · July 13, 2022, 5:57pm

Hi @pete26, no worries - the first time you run this, PyTorch will download the pre-trained model that it starts the training from (in this case, resnet18-*.pth). It would take much longer to train if it was trained totally from scratch. Anyways, it seems there was some network connection problem or issue with the PyTorch server when you initially ran this, thankfully which seemed to resolve itself on it’s own. Glad you were able to get it working!

system · July 27, 2022, 5:57pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.