Failing to train Resnet model with Cityscapes dataset

I want to train a Resnet model with the Cityscapes dataset but it keeps failing.
Using the train.py script.
Already remapped the cityscapes dataset with cityscapes_remap.py

Running on Windows 10
Python 3.7
torch 1.3.0 (tested 1.6.0 too)
torchvision 0.4.2 (tested 0.7.0 too)

C:\Users\Ad\PycharmProjects\nvidiaSegnet\venvSemseg\Scripts\python.exe C:/Users/Ad/PycharmProjects/nvidiaSegnet/pytorch-segmentation/train.py D:/Cityscapes_remap/ --dataset cityscapes --model-dir newModel -a fcn_resnet50 --workers 1
pytorch-segmentation/datasets/init.py
Not using distributed mode
Namespace(arch=‘fcn_resnet50’, aux_loss=False, batch_size=4, data=‘D:/Cityscapes_remap/’, dataset=‘cityscapes’, device=‘cuda’, dist_url=‘env://’, distributed=False, epochs=30, lr=0.01, model_dir=‘newModel’, momentum=0.9, pretrained=False, print_freq=10, resolution=320, resume=’’, test_only=False, weight_decay=0.0001, workers=1, world_size=1)
=> training with dataset: ‘cityscapes’ (train=561, val=233)
=> training with resolution: 320x320, 21 classes
=> training with model: fcn_resnet50
pytorch-segmentation/datasets/init.py
Traceback (most recent call last):
File “C:/Users/Ad/PycharmProjects/nvidiaSegnet/pytorch-segmentation/train.py”, line 332, in
main(args)
File “C:/Users/Ad/PycharmProjects/nvidiaSegnet/pytorch-segmentation/train.py”, line 293, in main
train_one_epoch(model, criterion, optimizer, data_loader, lr_scheduler, device, epoch, args.print_freq)
File “C:/Users/Ad/PycharmProjects/nvidiaSegnet/pytorch-segmentation/train.py”, line 184, in train_one_epoch
for image, target in metric_logger.log_every(data_loader, print_freq, header):
File “C:\Users\Ad\PycharmProjects\nvidiaSegnet\pytorch-segmentation\utils.py”, line 186, in log_every
for obj in iterable:
File “C:\Users\Ad\PycharmProjects\nvidiaSegnet\venvSemseg\lib\site-packages\torch\utils\data\dataloader.py”, line 819, in next
return self._process_data(data)
File “C:\Users\Ad\PycharmProjects\nvidiaSegnet\venvSemseg\lib\site-packages\torch\utils\data\dataloader.py”, line 846, in _process_data
data.reraise()
File “C:\Users\Ad\PycharmProjects\nvidiaSegnet\venvSemseg\lib\site-packages\torch_utils.py”, line 385, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File “C:\Users\Ad\PycharmProjects\nvidiaSegnet\venvSemseg\lib\site-packages\torch\utils\data_utils\worker.py”, line 178, in _worker_loop
data = fetcher.fetch(index)
File “C:\Users\Ad\PycharmProjects\nvidiaSegnet\venvSemseg\lib\site-packages\torch\utils\data_utils\fetch.py”, line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File “C:\Users\Ad\PycharmProjects\nvidiaSegnet\venvSemseg\lib\site-packages\torch\utils\data_utils\fetch.py”, line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File “C:\Users\Ad\PycharmProjects\nvidiaSegnet\venvSemseg\lib\site-packages\torchvision\datasets\cityscapes.py”, line 183, in getitem
image, target = self.transforms(image, target)
File “C:\Users\Ad\PycharmProjects\nvidiaSegnet\venvSemseg\lib\site-packages\torchvision\datasets\vision.py”, line 61, in call
input = self.transform(input)
TypeError: call() missing 1 required positional argument: ‘target’
Process finished with exit code 1

When I trained segmentation models using that code (it was some time ago now), I was using the v0.3.0 branch of this fork of torchvision:

https://github.com/dusty-nv/vision/tree/v0.3.0

So you may need to install that torchvision fork (v0.3.0) branch - or see this PR: https://github.com/dusty-nv/pytorch-segmentation/pull/4

Alternatively, since it has been some time since I trained segmentation model, you might want to try the upstream source in torchvision/references/segmentation: https://github.com/pytorch/vision/tree/master/references/segmentation

Basically what I added to my fork was support for FCN-ResNet18/FCN-ResNet34, the ability to export to ONNX (which required some model tweaks), and support for more datasets.

Im using now your modified v0.3.0 torchvision, but now there is a float in my way.

C:\Users\Ad\PycharmProjects\nvidiaSegnet\venv37\Scripts\python.exe C:/Users/Ad/PycharmProjects/nvidiaSegnet/pytorch-segmentation/train.py D:/Cityscapes_remap/ --dataset cityscapes --model-dir newModel -a fcn_resnet18 --workers 1
pytorch-segmentation/datasets/init.py
Not using distributed mode
Namespace(arch=‘fcn_resnet18’, aux_loss=False, batch_size=4, data=‘D:/Cityscapes_remap/’, dataset=‘cityscapes’, device=‘cuda’, dist_url=‘env://’, distributed=False, epochs=30, lr
=0.01, model_dir=‘newModel’, momentum=0.9, pretrained=False, print_freq=10, resolution=320, resume=’’, test_only=False, weight_decay=0.0001, workers=1, world_size=1)
=> training with dataset: ‘cityscapes’ (train=561, val=233)
=> training with resolution: 320x320, 21 classes
=> training with model: fcn_resnet18
torchvision.models.segmentation.fcn_resnet18()
Traceback (most recent call last):
File “C:/Users/Ad/PycharmProjects/nvidiaSegnet/pytorch-segmentation/train.py”, line 332, in
main(args)
File “C:/Users/Ad/PycharmProjects/nvidiaSegnet/pytorch-segmentation/train.py”, line 244, in main
pretrained=args.pretrained)
File “C:\Users\Ad\PycharmProjects\nvidiaSegnet\venv37\lib\site-packages\torchvision-0.3.0-py3.7-win-amd64.egg\torchvision\models\segmentation\segmentation.py”, line 70, in fcn_
resnet18
File “C:\Users\Ad\PycharmProjects\nvidiaSegnet\venv37\lib\site-packages\torchvision-0.3.0-py3.7-win-amd64.egg\torchvision\models\segmentation\segmentation.py”, line 50, in _seg
m_resnet
File “C:\Users\Ad\PycharmProjects\nvidiaSegnet\venv37\lib\site-packages\torchvision-0.3.0-py3.7-win-amd64.egg\torchvision\models\segmentation\fcn.py”, line 29, in init
File “C:\Users\Ad\PycharmProjects\nvidiaSegnet\venv37\lib\site-packages\torch\nn\modules\conv.py”, line 327, in init
False, _pair(0), groups, bias, padding_mode)
File “C:\Users\Ad\PycharmProjects\nvidiaSegnet\venv37\lib\site-packages\torch\nn\modules\conv.py”, line 40, in init
out_channels, in_channels // groups, *kernel_size))
TypeError: new() received an invalid combination of arguments - got (float, float, int, int), but expected one of:

  • (torch.device device)
  • (torch.Storage storage)
  • (Tensor other)
  • (tuple of ints size, torch.device device)
  • (object data, torch.device device)

Process finished with exit code 1

I gues the problem could be here in Line 41 and Line 49:

but the file is read-only on my side.
With the command “python setup.py install” it installed torchvision as a .egg file.

Maybe the bigger problem is that i want to run it on a Windows system, i guess you build it all on Linux/Ubuntu.

Yes, I ran it on Ubuntu PC with PyTorch 1.1 and that torchvision 0.3.0 fork, although it has been some time ago.

I would try this segmentation example from PyTorch, which should be more recently updated: https://github.com/pytorch/vision/tree/master/references/segmentation