U-Net Segmentation Training on custom data generates blank Inference Output

adityasingh · April 30, 2023, 4:09pm

Please provide the following information when requesting support.

• Hardware - RTX 3090
• Network Type - Vanilla Unet Dynamic
• TAO Toolkit 4.0.0 on WSL2 - Ubuntu 20.04.6 LTS inside Windows 11
• Training spec file -
scratch_v1_1.txt (1.6 KB)
• I’m trying to train a binary segmentation model to identify scratches on metal sheets.
Sample Image -

Mask -

I’m referring to Nvidia’s implementation of U-Net for Industrial Defect Detection on DAGM2007 dataset (which is weakly labeled so I have also labeled my data in similar elliptical manner).

My issue is that regardless of how many epochs I run, the final model gives a blank black output image & doesn’t segment anything, essentially classifying everything as background. The overlayed inferenced output looks like -

The evaluation results are -

Recall : 0.5

Precision: 0.9808197021484375

F1 score: 0.9903169895620691

Mean IOU: 0.49040985107421875

I have tried unet, vanilla_unet_dynamic, resnet, vgg. I have referred to other posts where Morgan had suggested to make sure all the images are png etc.
I’m not understanding what I’m doing wrong. I have previously successfully trained Unet Segmentation models on TAO. But many times when I add more data to those successfully trained models in a new training iteration, I come across this same issue where the final model stops segmenting anything and just generates a blank output.

Morganh · May 1, 2023, 5:03pm

Is it the same TAO version when you successfully trained Unet Segmentation models on TAO?

adityasingh · May 1, 2023, 5:09pm

Yes, it’s the same version.

docker image ls

nvcr.io/nvidia/tao/tao-toolkit 4.0.0-tf1.15.5

Morganh · May 6, 2023, 3:32am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

So, with the same nvcr.io/nvidia/tao/tao-toolkit 4.0.0-tf1.15.5, previously you can successfully train Unet models and run inference, but currently the inference is worse when add more data to previous trained models and run new training?
Could you compare the training log for both?

Topic		Replies	Views
TAO unet producing nan values TAO Toolkit	5	1017	April 21, 2022
Problem in training unet TAO Toolkit	22	2100	October 12, 2021
Tao unet model outputs only one class TAO Toolkit	6	132	August 15, 2024
Problems encountered in training unet and inference unet TAO Toolkit inference-server-triton	27	3046	October 12, 2021
Training multi-class UNet does not converge TAO Toolkit	31	3385	October 12, 2021
Cannot run tao unet dataset_convert because of docker mapping issue TAO Toolkit	6	887	March 24, 2023
UNet training progress counter frozen after ~18.000 steps TAO Toolkit	17	1068	October 20, 2023
Multiple classes not detected? TAO Toolkit	19	1160	October 12, 2021
Migrating TAO3 unet model to segformer, Foreground has performance of 0.0 ! TAO Toolkit	28	1352	February 27, 2023
Run TAO training using unet.ipynb in Jupyter Notebook failed TAO Toolkit	4	547	August 1, 2022

U-Net Segmentation Training on custom data generates blank Inference Output

Related topics