Transfer learning using the cityscapes dataset in unet-peopleSemSegNet causes poor generalization performance

OK, got it. The peoplesemsegnet tlt model, mentioned in model card (PeopleSemSegnet | NVIDIA NGC) , was trained on a proprietary dataset with more than 5 million objects for person class. The training dataset consists of a mix of camera heights, crowd-density, and field-of view (FOV). Approximately half of the training data consisted of images captured in an indoor office environment.

For your case, it is recommended to collect more CCTV data for training.

For NaN issue, please try lower batch size.

For improving accuracy, refer to Problems encountered in training unet and inference unet - #27 by Morganh, you can also use

  • loss: “cross_entropy”
  • weight: 2e-06
  • crop_and_resize_prob : 0.01
1 Like