Migrating TAO3 unet model to segformer, Foreground has performance of 0.0 !

Morganh · February 22, 2023, 3:28pm

Are test images much different from training images?
More, you can run evaluation against the training images to narrow down.

david9xqqb · February 22, 2023, 3:29pm

Sent you a private message with examples of both …

david9xqqb:

And evaluating with the same training images

#  test_img_dir: /data/images/val
#  test_ann_dir: /data/masks/val
  test_img_dir: /data/images/train
  test_ann_dir: /data/masks/train

I get

+------------+-------+-------+
| Class      | IoU   | Acc   |
+------------+-------+-------+
| foreground | 0.1   | 0.1   |
| background | 99.91 | 100.0 |
+------------+-------+-------+

Morganh · February 22, 2023, 3:35pm

david9xqqb:

+------------+-------+-------+
| Class      | IoU   | Acc   |
+------------+-------+-------+
| foreground | 31.29 | 36.52 |
| background | 99.93 | 99.99 |
+------------+-------+-------+

david9xqqb:

+------------+-------+-------+
| Class      | IoU   | Acc   |
+------------+-------+-------+
| foreground | 0.1   | 0.1   |
| background | 99.91 | 100.0 |
+------------+-------+-------+

It does not make sense since there are different result during the training.

david9xqqb · February 22, 2023, 3:42pm

@Morganh I agree it doesn’t make sense,

This is the test spec:

test_isbi.yaml (836 Bytes)

And full tao evaluate log:

evaluate log.txt (7.3 KB)

david9xqqb · February 22, 2023, 4:03pm

@Morganh The problem is with the last line of section 6 of the segformer notebook:

print('Rename a model: Note that the training is not deterministic, so you may change the model name accordingly.')
print('---------------------')
# NOTE: The following command may require `sudo`. You can run the command outside the notebook.
!find $HOST_RESULTS_DIR/isbi_experiment/ -name "iter_1000.tlt" | xargs realpath | xargs -I {} mv {} $HOST_RESULTS_DIR/isbi_experiment/isbi_model.tlt 
!ls -ltrh $HOST_RESULTS_DIR/isbi_experiment/isbi_model.tlt

-name “iter_1000.tlt”

I did 20,000 iterations but it kept copying the checkpoint file for the 1000th iteration as the final model

More results coming…

david9xqqb · February 22, 2023, 4:07pm

@Morganh After 20,000 iterations the tao evaluate with the val dataset results are

+------------+-------+-------+
| Class      | IoU   | Acc   |
+------------+-------+-------+
| foreground | 15.65 | 18.77 |
| background | 99.92 | 99.98 |
+------------+-------+-------+

After 88,400 iterations and about 20 hours of training:

tao evaluate on the val dataset

+------------+-------+-------+
| Class      | IoU   | Acc   |
+------------+-------+-------+
| foreground | 29.22 | 40.79 |
| background | 99.92 | 99.97 |
+------------+-------+-------+

However, running tao inference on a test dataset, it gives an error:

raise MissingMandatoryValue(“Missing mandatory value: $FULL_KEY”)
omegaconf.errors.MissingMandatoryValue: Missing mandatory value: dataset_config.test_ann_dir
full_key: dataset_config.test_ann_dir
reference_type=SFDatasetConfig
object_type=SFDatasetConfig

Why is the inference looking for an annotation (masks) directory?

But the important question is what to do to achieve better performance? Brute force more training time? Which hyperparameters to tweak?

I see the learning rate lr decreases over time … When restarting training from a checkpoint should I start with the same lr as it had at the checkpoint?

@Morganh Thanks for the help in all this!

Dave

Morganh · February 25, 2023, 3:52pm

That should be a problem. You can use a dummy one or val ann to work around.

Can you share the all the training log and resuming training log?

Morganh · February 27, 2023, 2:46am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Please try to add more images for segformer training.

More, please set lower range as below for the resized augmentation.

      resize:
        img_scale:
          - 704
          - 1280
        ratio_range:
          - 0.8
          - 1.2

system · March 27, 2023, 8:55am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
TAO5 unet vs segformer TAO Toolkit	12	986	August 19, 2023
Problem in training unet TAO Toolkit	22	1925	October 12, 2021
Problems encountered in training unet and inference unet TAO Toolkit inference-server-triton	27	2766	October 12, 2021
Training multi-class UNet does not converge TAO Toolkit	31	3033	October 12, 2021
MAJOR ACCURACY LOSS when EXPORTING tao unet model after retraining pruned model TAO Toolkit	29	1385	November 22, 2022
Accuracy and mIoU of 1.0 when validating Mask2Former TAO Toolkit	24	248	August 31, 2025
TAO 4 Segformer Input and output dimensions and tensors TAO Toolkit	11	820	March 20, 2023
Custom TAO unet model classifying only two classes on Deepstream! TAO Toolkit	34	1848	May 12, 2022
Segmentation with unet : shape error TAO Toolkit	8	1568	October 12, 2021
TAO 5.3 Segformer results poor TAO Toolkit	8	326	May 14, 2024

Migrating TAO3 unet model to segformer, **Foreground has performance of 0.0 !**

Related topics

Migrating TAO3 unet model to segformer, Foreground has performance of 0.0 !