MAJOR ACCURACY LOSS when EXPORTING tao unet model after retraining pruned model

Morganh · November 15, 2022, 9:22am

I need to check on my side as well.

Morganh · November 15, 2022, 4:27pm

Hi,
I still cannot reproduce the performance drop.
Please refer to my step as below.

$ tao unet run /bin/bash
Note: I am running with latest tao 22.05 version. Seems that you are running 21.11 version.

Then, train a model.

unet train -e unet_train_resnet_unet_isbi.txt -r /workspace/demo_3.0/forum_repro/unet_isbi/isbi_experiment_unpruned -m /workspace/demo_3.0/forum_repro/unet_isbi/pretrained_model/resnet_18.hdf5 -n model_isbi -k nvidia_tlt

Run evaluation.

unet evaluate -e unet_train_resnet_unet_isbi.txt -m /workspace/demo_3.0/forum_repro/unet_isbi/isbi_experiment_unpruned/weights/model_isbi.tlt -o /workspace/demo_3.0/forum_repro/unet_isbi/isbi_experiment_evaluate -k nvidia_tlt

Result:

root@9deabc2e2957:/workspace/demo_3.0/forum_repro/unet_isbi# cat isbi_experiment_evaluate/results_tlt.json
"{'foreground': {'precision': 0.70791817, 'Recall': 0.77023894, 'F1 Score': 0.7377648243662859, 'iou': 0.5844907}, 'background': {'precision': 0.94187826, 'Recall': 0.92135996, 'F1 Score': 0.9315061321651128, 'iou': 0.8717936}}"

Export the model and generate trt engine.
unet export -m /workspace/demo_3.0/forum_repro/unet_isbi/isbi_experiment_unpruned/weights/model_isbi.tlt -e unet_train_resnet_unet_isbi.txt --engine_file export/trtfp32.isbi.unpruned.engine -k nvidia_tlt

Run evaluation against the trt engine.
unet evaluate -e unet_train_resnet_unet_isbi.txt -m /workspace/demo_3.0/forum_repro/unet_isbi/export/trtfp32.isbi.unpruned.engine -o /workspace/demo_3.0/forum_repro/unet_isbi/isbi_experiment_evaluate_engine -k nvidia_tlt

Result:

root@9deabc2e2957:/workspace/demo_3.0/forum_repro/unet_isbi# cat /workspace/demo_3.0/forum_repro/unet_isbi/isbi_experiment_evaluate_engine/results_trt.json
"{'foreground': {'precision': 0.70795435, 'Recall': 0.77022797, 'F1 Score': 0.7377794104039289, 'iou': 0.58450913}, 'background': {'precision': 0.94187653, 'Recall': 0.92137486, 'F1 Score': 0.931512872531945, 'iou': 0.8718055}}"

david9xqqb · November 17, 2022, 8:18am

I did not understand what you did there. I thought the purpose was for you to replicate my scenario to replicate my problem, but you go and run on something that I don’t even have access to:

In any case,

Somehow there is a discrepancy. If I run 

tao unet run /bin/bash

I get 3,21,11

Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3

But running

tao info

Configuration of the TAO Toolkit Instance
dockers: [‘nvidia/tao/tao-toolkit-tf’, ‘nvidia/tao/tao-toolkit-pyt’, ‘nvidia/tao/tao-toolkit-lm’]
format_version: 2.0
toolkit_version: 3.22.02
published_date: 02/28/2022

This seems like a dead end… Very frustrating to waste sooooo much time on this just to go around in an endless loop

Morganh · November 17, 2022, 8:23am

To look into your issue, we need to check the gap between us. Even with default isbi notebook, there is gap on your side.
When I use latest TAO version(22.05) to try to reproduce the performance gap, but cannot reproduce.

Can you attach the .ipynb and also all the dataset which I can reproduce?

BTW,
For /workspace/demo_3.0/forum_repro/unet_isbi
it is just a folder on my side. You can ignore it.

Morganh · November 17, 2022, 8:25am

I believe you are using an old version of TAO.

Please share the result of below command.

$ tao info --verbose

Please install the latest.
$ pip3 install nvidia-tao

Or for 22.05 docker, you can pull
nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.5-py3

Morganh · November 17, 2022, 2:03pm

Can you attach the tlt model you have trained with isbi notebook?

david9xqqb · November 21, 2022, 8:44am

NO. Not open source

I will be reinstalling a new workstation soon. It seems to be the only way to isolate the problem, but this whole thing has been very very time consuming and I can’t freeze my project on this.

Morganh · November 21, 2022, 8:52am

Understood. For the tlt model you have trained with isbi notebook, if you have kept it, you can share with me. It is trained on isbi dataset. So, there is not copyright issue. Then I can use it to check again in 22.05.
Anyway, I will check if there is issue in 21.11.

Morganh · November 22, 2022, 7:29am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one.
Thanks

Hi,
Indeed, there is performance drop in 21.11 version.

In 21.11,  the tlt model result:
"{'foreground': {'precision': 0.69601804, 'Recall': 0.7722573, 'F1 Score': 0.7321583749601505, 'iou': 0.5774841}, 'background': {'precision': 0.94207376, 'Recall': 0.91653854, 'F1 Score': 0.9291307369557293, 'iou': 0.8676416}}"

the tensorrt engine result:
"{'foreground': {'precision': 0.59434956, 'Recall': 0.6486686, 'F1 Score': 0.6203222234776844, 'iou': 0.44961384}, 'background': {'precision': 0.91104954, 'Recall': 0.89044565, 'F1 Score': 0.9006297727298221, 'iou': 0.81922334}}"

You can consider either of below solutions.

Solution 1 :
Dot not need to train tlt model in 22.05. Just need to use your 21.11 tlt model and run export under 22.05 docker. It will generate a new tensorrt engine. Use it and run evaluation again.
I confirm that there is no performance drop now.

"{'foreground': {'precision': 0.6960101, 'Recall': 0.77224094, 'F1 Score': 0.7321466222923124, 'iou': 0.57746947}, 'background': {'precision': 0.94206977, 'Recall': 0.91653717, 'F1 Score': 0.9291280902702932, 'iou': 0.867637}}"

Solution 2:
Use 22.05 version to do training/evaluation/etc.

system · December 30, 2022, 7:22am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Migrating TAO3 unet model to segformer, Foreground has performance of 0.0 ! TAO Toolkit	28	998	February 27, 2023
Tao toolkit observations TAO Toolkit	56	821	May 29, 2024
UffParser: Validator error: block_4c_bn_3/cond/Switch: Unsupported operation _Switch TAO Toolkit tensorrt	38	1371	January 11, 2022
Tao-converter [ERROR] Failed to parse the model, please check the encoding key to make sure its correct TAO Toolkit deepstream	70	1676	July 10, 2023
Custom TAO unet model classifying only two classes on Deepstream! TAO Toolkit	34	1696	May 12, 2022
Mix propriertary and public dataset for retrain TAO Toolkit	34	1153	March 10, 2022
Training emotionnet with tao toolkit through Jupyter Notebook TAO Toolkit	26	886	December 12, 2022
Tlt-infer detectnet_v2 fails - TypeError TAO Toolkit	37	1402	October 12, 2021
Cannot run tao unet dataset_convert because of docker mapping issue TAO Toolkit	6	763	March 24, 2023
Problems encountered in training unet and inference unet TAO Toolkit inference-server-triton	27	2662	October 12, 2021

MAJOR ACCURACY LOSS when EXPORTING tao unet model after retraining pruned model

Related topics