UNet expects the images and corresponding masks encoded as images. Each mask image is a single-channel image, where every pixel is assigned an integer value that represents the segmentation class.
so i have turned all the mask image to single-channel by running cv2’s BGR2GRAY. since the cityscapes dataset and railsem19 dataset has different annotations, so i had turned their original single-channel mask image to rgb image so i can recognize it better, changed them into same color (for example: let both dataset’s tree to green) to normalize the dataset , lastly i turned the modified mask dataset to grayscale again.
p.s. changed account due to the reply restriction to the new account
It does not match your training classes.
Please note that the pixel integer value should be equal to the value of the label_id provided in the spec.
UNet expects the images and corresponding masks encoded as images. Each mask image is a single-channel image, where every pixel is assigned an integer value that represents the segmentation class.
See Data Annotation Format - NVIDIA Docs
I’ve done what you asked and changed my mask images according to labels id (0~17) but even if i don’t change my backbone to vanilla_unet_dynamic is that normal that i have a precision around 0.25?
i do realize that my mask image has certain amount of pixels that is outlining the objects but i currently haven’t come out way to solve the problem, but is that the only reason(or main reason) that cause the low precision?
UNet expects the images and corresponding masks encoded as images. Each mask image is a single-channel image, where every pixel is assigned an integer value that represents the segmentation class.