U-Net Segmentation Training on custom data generates blank Inference Output

Please provide the following information when requesting support.

• Hardware - RTX 3090
• Network Type - Vanilla Unet Dynamic
• TAO Toolkit 4.0.0 on WSL2 - Ubuntu 20.04.6 LTS inside Windows 11
• Training spec file -
scratch_v1_1.txt (1.6 KB)
• I’m trying to train a binary segmentation model to identify scratches on metal sheets.
Sample Image -

Mask -

I’m referring to Nvidia’s implementation of U-Net for Industrial Defect Detection on DAGM2007 dataset (which is weakly labeled so I have also labeled my data in similar elliptical manner).

My issue is that regardless of how many epochs I run, the final model gives a blank black output image & doesn’t segment anything, essentially classifying everything as background. The overlayed inferenced output looks like -

The evaluation results are -

Recall : 0.5

Precision: 0.9808197021484375

F1 score: 0.9903169895620691

Mean IOU: 0.49040985107421875

I have tried unet, vanilla_unet_dynamic, resnet, vgg. I have referred to other posts where Morgan had suggested to make sure all the images are png etc.
I’m not understanding what I’m doing wrong. I have previously successfully trained Unet Segmentation models on TAO. But many times when I add more data to those successfully trained models in a new training iteration, I come across this same issue where the final model stops segmenting anything and just generates a blank output.

Is it the same TAO version when you successfully trained Unet Segmentation models on TAO?

Yes, it’s the same version.

docker image ls

nvcr.io/nvidia/tao/tao-toolkit 4.0.0-tf1.15.5

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

So, with the same nvcr.io/nvidia/tao/tao-toolkit 4.0.0-tf1.15.5, previously you can successfully train Unet models and run inference, but currently the inference is worse when add more data to previous trained models and run new training?
Could you compare the training log for both?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.