MAJOR ACCURACY LOSS when EXPORTING tao unet model after retraining pruned model

• Hardware RTX3090
• Network Type unet/resnet
• TLT Version

Configuration of the TAO Toolkit Instance
dockers: [‘nvidia/tao/tao-toolkit-tf’, ‘nvidia/tao/tao-toolkit-pyt’, ‘nvidia/tao/tao-toolkit-lm’]
format_version: 2.0
toolkit_version: 3.22.02
published_date: 02/28/2022

• Training spec file (see blow…)

• How to reproduce the issue ?

After completing retraining of the pruned model, the performance numbers are

“{‘foreground’: {‘precision’: 0.9996697, ‘Recall’: 0.9997607, ‘F1 Score’: 0.999715147331038, ‘iou’: 0.99943054}, ‘background’: {‘precision’: 0.6680363, ‘Recall’: 0.59312135, ‘F1 Score’: 0.6283537780304688, ‘iou’: 0.45810193}}”

BUT AFTER EXPORTING FP32:

“{‘foreground’: {‘precision’: 0.999465, ‘Recall’: 0.9997112, ‘F1 Score’: 0.9995880571368052, ‘iou’: 0.9991765}, ‘background’: {‘precision’: 0.48937297, ‘Recall’: 0.34082818, ‘F1 Score’: 0.4018112926271863, ‘iou’: 0.25141668}}”

These are the commands:

# Retraining using the pruned model as pretrained weights 
!tao unet train --gpus=1 --gpu_index=$GPU_INDEX \
              -e $SPECS_DIR/unet_retrain_resnet_6S300.txt \
              -r $USER_EXPERIMENT_DIR/retrain \
              -m $USER_EXPERIMENT_DIR/pruned/model_pruned.tlt \
               -n model_retrained \
              -k $KEY 

unet_retrain_resnet_6S300.txt (1.3 KB)

running evaluate before exporting:

!tao unet evaluate --gpu_index=$GPU_INDEX -e $SPECS_DIR/unet_retrain_resnet_6S300.txt \
                 -m $USER_EXPERIMENT_DIR/retrain/weights/model_retrained.tlt \
                 -o $USER_EXPERIMENT_DIR/retrain/ \
                 -k $KEY

Export to FP32

# Export in FP32 mode. 

!tao unet export --gpu_index=$GPU_INDEX -m $USER_EXPERIMENT_DIR/retrain/weights/model_retrained.tlt \
               -k $KEY \
               -e $SPECS_DIR/unet_retrain_resnet_6S.txt  \
               -o $USER_EXPERIMENT_DIR/export/tao.6S004C.etlt \
               --data_type fp32 \
               --engine_file $USER_EXPERIMENT_DIR/export/tao.fp32_6S004C.engine \
               --max_batch_size 2 \
               --batch_size 1 

And evaluate the exported model:

!tao unet evaluate --gpu_index=$GPU_INDEX -e $SPECS_DIR/unet_evaluate_resnet_6S.txt \
                 -m $USER_EXPERIMENT_DIR/export/tao.fp32_6S004C.engine  \
                 -o $USER_EXPERIMENT_DIR/export/ \
                 -k $KEY

(Need a new spec file for evaluate because the new maximum batch size is 2):
unet_evaluate_resnet_6S.txt (1.3 KB)

More Stuff:

Retrained Model Evaluate Log.txt (40.3 KB)

Exported FP32 Model Evaluate Log.txt (3.4 KB)

Many Thanks for your help

To narrow down,

  1. could you try to export the unpruned tlt model to a tensorrt engine and then run evaluation?

  2. More, you can also run evaluation against the pruned tlt model.
    See UNET — TAO Toolkit 3.22.05 documentation

  • -m, --model_path: The path to the model file to use for evaluation. This could be a .tlt model file or a tensorrt engine generated using the export tool.

Exported the unpruned model as

!tao unet export --gpu_index=$GPU_INDEX -m $USER_EXPERIMENT_DIR/unpruned/weights/model.tlt \
               -k $KEY \
               -e $SPECS_DIR/unet_retrain_resnet_6S.txt  \
               -o $USER_EXPERIMENT_DIR/export/tao.unpruned.6S004C.etlt \
               --data_type fp32 \
               --engine_file $USER_EXPERIMENT_DIR/export/tao.unpruned.fp32_6S004C.engine \
               --max_batch_size 2 \
               --batch_size 1 

Then evaluate that…


!tao unet evaluate --gpu_index=$GPU_INDEX -e $SPECS_DIR/unet_evaluate_resnet_6S.txt \
                 -m $USER_EXPERIMENT_DIR/export/tao.unpruned.fp32_6S004C.engine  \
                 -o $USER_EXPERIMENT_DIR/export/ \
                 -k $KEY

Yields the same major drop in accurracy Please note the background performance numbers:

“{‘foreground’: {‘precision’: 0.99951434, ‘Recall’: 0.9995958, ‘F1 Score’: 0.9995550496662846, ‘iou’: 0.9991106}, ‘background’: {‘precision’: 0.44661728, ‘Recall’: 0.40175736, ‘F1 Score’: 0.4230012926112531, ‘iou’: 0.26823184}}”

!tao unet evaluate --gpu_index=$GPU_INDEX -e $SPECS_DIR/unet_retrain_resnet_6S300.txt \
                 -m $USER_EXPERIMENT_DIR/retrain/weights/model_retrained.tlt \
                 -o $USER_EXPERIMENT_DIR/retrain/ \
                 -k $KEY

Yields the better results without the drop in accuracy !!!

“{‘foreground’: {‘precision’: 0.9996697, ‘Recall’: 0.9997607, ‘F1 Score’: 0.999715147331038, ‘iou’: 0.99943054}, ‘background’: {‘precision’: 0.6680363, ‘Recall’: 0.59312135, ‘F1 Score’: 0.6283537780304688, ‘iou’: 0.45810193}}”

Actually I cannot reproduce the issue you mention.
I trained the fire dataset previously. See topic Problems encountered in training unet and inference unet - #27 by Morganh
I train the model of 960x544.

And today I use it to run evaluation against tlt model or trt engine. There is not accuracy drop.

$ cat evaluation_result_trt/results_trt.json
“{‘fire’: {‘precision’: 0.9988102, ‘Recall’: 0.99939096, ‘F1 Score’: 0.999100481505847, ‘iou’: 0.9982026}, ‘background’: {‘precision’: 0.88543546, ‘Recall’: 0.79814464, ‘F1 Score’: 0.8395270786586672, ‘iou’: 0.72343534}}”

$ cat evaluation_result_json/results_tlt.json
“{‘fire’: {‘precision’: 0.9988102, ‘Recall’: 0.9993909, ‘F1 Score’: 0.9991004219185265, ‘iou’: 0.9982025}, ‘background’: {‘precision’: 0.885423, ‘Recall’: 0.7981348, ‘F1 Score’: 0.839516098048122, ‘iou’: 0.72341895}}”

Are your test images 1280x704? Is it .png file?

Yes, they are.

I have a tao model trained with a color dataset for multiclass using unet vgg, that does not have this problem.

This is based on the same dataset, converted to grayscale images and masks with values in {0, 255} like in the isbi example.

The exported model works, but with very poor performance… to the point that it’s not a viable model in real life operation.

I am close to the end of experimenting with this… Need to find a solution for my project…

So, there is no issue when you trained with a color dataset for multiclass.
For this topic, all the results you mentioned above are from training with a color dataset, right?

No accuracy drop for tensorrt engine now. Am I correct?

Incorrect. I still have a big issue.

I trained a multiclass semantic segmentation unet model based on color images that has very poor performance.

When exporting to tensorRT engine, that model has no significant drop in performance metrics. This is the same whether the backbone is resnet or vgg.

I can’t use that model in real life. As a solution, I decided to try to create a binary semantic segmentation model, and combine both at runtime. So, I preprocessed the dataset to create a grayscale images and binary masks, similar to the isbi example.

When training the binary semantic segmentation model on tao, and doing inference, and evaluating the pruned retrained model, it has good performance.

But when exporting to tensroRT, before or after pruning, the performance drops and the model becomes improper for a real life application…

So the major accuracy loss when exporting still exists for the binary semantic segmentation based on a grayscale dataset.

Thanks…

.

So, let us make it clear and fix issues one by one.
I suggest you to train color images firstly.
See topic Problems encountered in training unet and inference unet - #27 by Morganh , it can get good accuracy.
You can leverage it.

@Morganh Thank you so much for your kind words, and willingness to help.

Here is the thing: My team has logged very many hours preparing and training the multiclass tao unet model, which was the reason for many of my posts here. Total training computer time is just over 1600 hours total on a RTX3090 over the past year.

We tried all kinds of things. Augmentation with rotation, zooming in, zooming out, etc etc.

We also tried each of the backbones available in tao for unet, and found vgg16 to produce the least bad performance.

These are the cuirrent performance indicators we were able to obtain with a dataset of 500 training images and 78 validation images, 700 epochs before prunning, and 300 after prunning with vgg16:

{'Background': {'precision': 0.9980877, 'Recall': 0.9973183, 'F1 Score': 0.997702867478441, 'iou': 0.9954163}, 
'ClassA': {'precision': 0.6313939, 'Recall': 0.61600935, 'F1 Score': 0.6236067611277936, 'iou': 0.4530731}, 
'ClassB': {'precision': 0.99042, 'Recall': 0.9949356, 'F1 Score': 0.9926726129677101, 'iou': 0.9854519}, 
'ClassC': {'precision': 0.8239573, 'Recall': 0.7460519, 'F1 Score': 0.7830717437354773, 'iou': 0.64348227}, 
'ClassD': {'precision': 0.8212333, 'Recall': 0.7595217, 'F1 Score': 0.7891729101864037, 'iou': 0.65176356}, 
'ClassE': {'precision': 0.95095384, 'Recall': 0.95262665, 'F1 Score': 0.9517895381866079, 'iou': 0.9080137}}

These are the training specs:

random_seed: 42
model_config {
  model_input_width: 1280
  model_input_height: 704
  model_input_channels: 3
  num_layers: 16
  all_projections: true
  arch: "vgg"
  use_batch_norm: False
  training_precision {
    backend_floatx: FLOAT32
  }
}

training_config {
  batch_size: 4
  epochs: 700
  log_summary_steps: 10
  checkpoint_interval: 100
  loss: "cross_entropy"
  learning_rate:0.0001
  regularizer {
    type: L2
    weight: 2e-6
  }
  optimizer {
    adam {
      epsilon: 9.99999993923e-09
      beta1: 0.899999976158
      beta2: 0.999000012875
    }
  }
}

We are running the model on C++ using tensorRT and Intel realSense cameras. You can see how we exported the models in the many posts from me about that here in the forum.

Inference results on Classes A C and D produce a number of issues that make the model useless for real life:

a) Even with a fixed video were every frame is the same, segmentation results vary, sometimes substantially, producing a “flicker” effect on classification. This problem we have been able to reduce by post processing the inference results and eliminating unreasonable results. For example, Class C cannot be surrounded by Class D, therefore we reclassify that as Class D…

b) ClassA is often not detected at all. Here we developed the binary unet model to only detect ClassA. the subject of this post . The initial inference results from within the tao notebook are promising, and even the poor performance exported model improves the performance of the whole system. We are able to load and run both models with no issues.

If only the binary exported model would not deteriorate in performance so much during the export…

Hi ,
May I know the target of your training? What do you mean by “bad performance.” ? From your result, I can see some class can get high F1.

More, according to your comment, you are training with a dataset of 500 training images, right? Did you ever try to add more training images?

For further experiments, please consider below.

  1. Add more training images. For example, total 2000 images.
  2. Add " crop_and_resize_prob : 0.01" as mentioned in UNET Training on Multi-Class Segmentation from Satellite Imagery (DSTL) - #3 by Morganh
  3. Try to use backbone: vanilla_unet_dynamic or efficientnet_b0 . Refer to UNET Training on Multi-Class Segmentation from Satellite Imagery (DSTL) - #6 by Morganh

After training, please run “tao evaluation” to check if .tlt model meets your requirement.
This is the base line. Then export the model and deploy it with deepstream. See UNET - NVIDIA Docs and run with GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream.
For tensorrt engine, you can run “tao evaluation” firstly according to UNET - NVIDIA Docs. Or leverage your old topic Custom TAO unet model classifying only two classes on Deepstream! - #35 by Morganh

Thanks for all of that.

Creating 1,500 custom images and annotating is a very expensive proposition in time and money… And I have neither in my budget at this time. If I’m able to show better results I’ll get all the budget I need…

We have great control with C++ and tensorrt and love it. We have no issues with that at all. That’s actually the one element of my current ML pipeline that actually works very well.

But all your answers are unrelated to my original post:

How do I solve the performance drop in the binary semantic segmentation model when exporting from tao to tensorrt…???

Actually I cannot reproduce performance drop when export from tao to tensorrt. See my previous comment.
(MAJOR ACCURACY LOSS when EXPORTING tao unet model after retraining pruned model - #5 by Morganh)

So, for this topic, the original issue you are asking is that “the .trt has performance drop against .tlt model.”. But from our test result, we cannot reproduce it.

Can this issue happen in official Unet notebook?
Or just in your custom dataset?

So, to answer I need to run the isbi notebook, and down the rabbit hole we go again…

While the training and prunning run to completion, retraining the prunned model yields an error:

Input to reshape is a tensor with 614400 values, but the requested shape has 307200

No changes whatsoever done to any of the files other than the isbi image preparation as per instructions…

This is the complete retraining log: bad isbi retrain.log (57.1 KB)

Seems it is another problem you meet. But seems that your spec file is missing “label_id: 0”. Please check.
Let us make thing easier firstly. Need to check the gap one by one.
Refer to

For the isbi notebook, could you try to export the unpruned tlt model to a tensorrt engine and then run evaluation?

I was able to replicate the drop in performance on the isbi unet tao experiment:

evaluate unpruned:

"{'foreground': {'precision': 0.6413898, 'Recall': 0.816966, 'F1 Score': 0.7186087958822599, 'iou': 0.5608036},
  'background': {'precision': 0.9504471, 'Recall': 0.88486874, 'F1 Score': 0.9164863099156786, 'iou': 0.84584653}}"

Evaluate after export:

"{'foreground': {'precision': 0.5, 'Recall': 0.5, 'F1 Score': 0.5, 'iou': 0.33333334},
  'background': {'precision': 0.5, 'Recall': 0.5, 'F1 Score': 0.5, 'iou': 0.33333334}}"

Export Command:

!tao unet export --gpu_index=$GPU_INDEX -m $USER_EXPERIMENT_DIR/isbi_experiment_unpruned/weights/model_isbi.tlt \
               -k $KEY \
               -e $SPECS_DIR/unet_train_resnet_unet_isbi.txt \
               --data_type fp32 \
               --engine_file $USER_EXPERIMENT_DIR/export/trtfp32.isbi.unpruned.engine \
               --max_batch_size 3 \
               --gen_ds_config

full export log: export.log (18.8 KB)

!tao unet evaluate --gpu_index=$GPU_INDEX -e $SPECS_DIR/unet_train_resnet_unet_isbi.txt \
                 -m $USER_EXPERIMENT_DIR/export/trtfp32.isbi.unpruned.engine \
                 -o $USER_EXPERIMENT_DIR/isbi_experiment_unpruned/ \
                 -k $KEY

Full evaluate log: evaluate.log (3.4 KB)

It does not make sense to get all the result of 0.5.
Could you share your $SPECS_DIR/unet_train_resnet_unet_isbi.txt ?

It’s the default file

unet_train_resnet_unet_isbi.txt (1.4 KB)

Thanks, I will try to check if I can reproduce.
When I check with the public “fire” color dataset , there is not such issue.

But here: I replicated the issue with Nvidia’s official tao unet isbi experiment.