I want to train the cradio+segformer model on more than just a single class now. I have prepared my labels in 8-bit png masks where the bits (0, 1, 2, etc.) correspond to my label_id in dataset.segment.palette and label_transform: None. I also have provided a palette for each label_id as follows:
Is this setup correct? What I’m observing is again the first class (background) is learning, and the other ones are all at 0 accuracy and loss; wondering if this is due to the palette being all zeros for the background
I’ve also attached a full example of the experiment yaml for a trial run with 2 classes
Hi, @kianmehr.ehtiatkar2
Please set num_classes to the classes number without background. The example spec yaml contains 6 classes and one unknown class(should be the background.).
Hi @Morganh , I’m setting the number of classes to not include the background but also including background as a palette with label_id=255. Inspecting the tensorboard event I see that only the first label is being learned, but even that is not done correctly. For instance, the metrics for iou_0 and F1_0 are changing and iou_1 and F1_1 are staying at 0. The inference produces all black masks which would be all background (which is not even included in the labels). I can tell from the model architecture printed during training that the decoder had has a dimension of 2, so it matches my 2 foreground classes, so I’m not sure where the disconnect is coming from.
To reiterate, my input images are RGB, and my masks are single-channel pixelwise images where the pixel integers correspond to the label_id of the classes included.
Example combined visualization from inference attached.
Hi @kianmehr.ehtiatkar2 ,
Every pixel in the mask must have an integer value that represents the segmentation class label_id ". But I find that some mask files do not have the correct pixel values.
Please double check the mask files. Thanks!
I’m noticing that only the first label or label_id=0 is being learned per the combined visualization image attached. I’ve also attached the mask file for this image for reference showing 2 objects with integers 0 and 1 and background at 255. Only integer 0 is being learned, and the model performance is also reflecting this.
Hi @kianmehr.ehtiatkar2 ,
Firstly, make sure each mask file has corresponding pixel value against the class.
For example, one class has pixel-value=0, another class has pixel-value=1.
Successful case1:
Then if you are going to use palette, please change 1-channel mask png file to 3-channel mask png file. Also, need to set label_transform: None.
# cat change_1_channle_to_3_channel_green.py
# pip install pillow numpy
import os, glob
import numpy as np
from PIL import Image
#in_dir = "xxx/data/masks/train" # 1-channel 0/1 mask folder
#out_dir = "xxx/data/masks_3channel/train" # output RGB mask folder
in_dir = "xxx/data/masks/val" # 1-channel 0/1 mask folder
out_dir = "xxx/data/masks_3channel/val" # output RGB mask folder
os.makedirs(out_dir, exist_ok=True)
for p in glob.glob(os.path.join(in_dir, "*.png")):
g = np.array(Image.open(p)) # 8-bit 1-channel
assert g.ndim == 2, f"Not single channel: {p}"
rgb = np.zeros((g.shape[0], g.shape[1], 3), dtype=np.uint8)
rgb[g == 1] = (0, 255, 0) # set to green color
Image.fromarray(rgb, mode="RGB").save(
os.path.join(out_dir, os.path.basename(p)), format="PNG"
@Morgan Huang, thank you for the response. I ran a quick experiment with the first method with removing the palette, and I saw meaningful metrics, so thank you!
If I have multiple classes, how do I extract the respective class from my model from the resulting segmentation from inference? The produced masks are black and white and binary and don’t differentiate between the different classes. For that, do I need to specify the palette?
Also question on the 3-channel masks. Do I need to keep the same formatting for the labels. In other words, do I still have pixel integers for each class that is repeated across the 3 channels? class_1 being (0, 0, 0), class_2 (1, 1, 1), background (255, 255, 255), etc.?
Lastly, can I use this pattern for a binary task as well? My intention is to avoid having two preprocessing pipelines for segformer training. In other words, my masks having 2 integer values (0, and 255), and num_classes=1
I already tried all the above, and seems like no matter the num_class, if I don’t include a palette and set label transform to norm, all foreground classes are combined into a single foreground class. Also, when I set background to 255, my val_loss metric starts reporting only NaN. The only combination that worked with proper metrics was including everything including the background class plus 3-channel masks.
Here are the charts for reference. Plus it doesn’t seem like the labels are loaded properly. Despite 3-channel RGB masks with palettes (picture attached with black, red, and green), the combined visualization shows black and white masks.