Jetson Inference Train Issue

Description

Having issues training the classifier with my own data set of images with the Jetson Inferences Library.

Environment

Jetson TX2

TensorRT Version: 7.1.3-1
GPU Type: NVIDIA Tegra X2
Nvidia Driver Version:
CUDA Version: 10.2
CUDNN Version:
Operating System + Version: Ubuntu 18.04LTS
Python Version (if applicable): 3.6.9
PyTorch Version (if applicable): 1.60

Issue

I have been following this instruction to gather and then train a new model based on my own images in a dataset directory. https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-collect.md

I have installed Jetson Jetpack no problem and have been able to run examples and detect.py and classify.py files on pretrained models provided by the jetson-inferences library. It just cannot train new images to a new model. I have followed the correct instructions in the jetson-inferences page I linked above on how to structure the image directories. Whenever I try and train using this guide it tells me it cannot identify the image file? I have been able to access and read/write the images with opencv and followed the correct instructions on how to install the correct version of PIL. I am on the latest PIL version of 8.0.1. Any help on this would be much appricated!! Thank you, below are some details on where I am when this error happens and the error message itself.

Steps To Reproduce

Command: python3 train.py --model-dir=models/ground data/ground
In the jetson-inference/python/training/classification directory.

ERROR MESSAGE:

Use GPU: 0 for training
=> dataset classes: 4 [‘asphalt’, ‘concrete’, ‘dirt’, ‘gravel’]
=> using pre-trained model ‘resnet18’
=> reshaped ResNet fully-connected layer with: Linear(in_features=512, out_features=4, bias=True)
Epoch: [0][ 0/236] Time 7.711 ( 7.711) Data 1.076 ( 1.076) Loss 1.1834e+00 (1.1834e+00) Acc@1 50.00 ( 50.00) Acc@5 100.00 (100.00)
Traceback (most recent call last):
File “train.py”, line 506, in
main()
File “train.py”, line 135, in main
main_worker(args.gpu, ngpus_per_node, args)
File “train.py”, line 277, in main_worker
train(train_loader, model, criterion, optimizer, epoch, num_classes, args)
File “train.py”, line 321, in train
for i, (images, target) in enumerate(train_loader):
File “/home/steven/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 363, in next
data = self._next_data()
File “/home/steven/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 971, in _next_data
return self._process_data(data)
File “/home/steven/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 1014, in _process_data
data.reraise()
File “/home/steven/.local/lib/python3.6/site-packages/torch/_utils.py”, line 395, in reraise
raise self.exc_type(msg)
PIL.UnidentifiedImageError: Caught UnidentifiedImageError in DataLoader worker process 1.
Original Traceback (most recent call last):
File “/home/steven/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py”, line 185, in _worker_loop
data = fetcher.fetch(index)
File “/home/steven/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py”, line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File “/home/steven/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py”, line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File “/usr/local/lib/python3.6/dist-packages/torchvision-0.7.0a0+78ed10c-py3.6-linux-aarch64.egg/torchvision/datasets/folder.py”, line 137, in getitem
sample = self.loader(path)
File “/usr/local/lib/python3.6/dist-packages/torchvision-0.7.0a0+78ed10c-py3.6-linux-aarch64.egg/torchvision/datasets/folder.py”, line 173, in default_loader
return pil_loader(path)
File “/usr/local/lib/python3.6/dist-packages/torchvision-0.7.0a0+78ed10c-py3.6-linux-aarch64.egg/torchvision/datasets/folder.py”, line 155, in pil_loader
img = Image.open(f)
File “/usr/local/lib/python3.6/dist-packages/Pillow-8.0.1-py3.6-linux-aarch64.egg/PIL/Image.py”, line 2944, in open
“cannot identify image file %r” % (filename if filename else fp)
PIL.UnidentifiedImageError: cannot identify image file <_io.BufferedReader name=‘data/ground/train/asphalt/image181.JPEG’>

Hi @stevenbucher8,
This looks like a Jetson issue. We recommend you to raise it to the respective platform from the below link

Thanks!