I need to check on my side as well.
Hi,
I still cannot reproduce the performance drop.
Please refer to my step as below.
$ tao unet run /bin/bash
Note: I am running with latest tao 22.05 version. Seems that you are running 21.11 version.
Then, train a model.
unet train -e unet_train_resnet_unet_isbi.txt -r /workspace/demo_3.0/forum_repro/unet_isbi/isbi_experiment_unpruned -m /workspace/demo_3.0/forum_repro/unet_isbi/pretrained_model/resnet_18.hdf5 -n model_isbi -k nvidia_tlt
Run evaluation.
unet evaluate -e unet_train_resnet_unet_isbi.txt -m /workspace/demo_3.0/forum_repro/unet_isbi/isbi_experiment_unpruned/weights/model_isbi.tlt -o /workspace/demo_3.0/forum_repro/unet_isbi/isbi_experiment_evaluate -k nvidia_tlt
Result:
root@9deabc2e2957:/workspace/demo_3.0/forum_repro/unet_isbi# cat isbi_experiment_evaluate/results_tlt.json
"{'foreground': {'precision': 0.70791817, 'Recall': 0.77023894, 'F1 Score': 0.7377648243662859, 'iou': 0.5844907}, 'background': {'precision': 0.94187826, 'Recall': 0.92135996, 'F1 Score': 0.9315061321651128, 'iou': 0.8717936}}"
Export the model and generate trt engine.
unet export -m /workspace/demo_3.0/forum_repro/unet_isbi/isbi_experiment_unpruned/weights/model_isbi.tlt -e unet_train_resnet_unet_isbi.txt --engine_file export/trtfp32.isbi.unpruned.engine -k nvidia_tlt
Run evaluation against the trt engine.
unet evaluate -e unet_train_resnet_unet_isbi.txt -m /workspace/demo_3.0/forum_repro/unet_isbi/export/trtfp32.isbi.unpruned.engine -o /workspace/demo_3.0/forum_repro/unet_isbi/isbi_experiment_evaluate_engine -k nvidia_tlt
Result:
root@9deabc2e2957:/workspace/demo_3.0/forum_repro/unet_isbi# cat /workspace/demo_3.0/forum_repro/unet_isbi/isbi_experiment_evaluate_engine/results_trt.json
"{'foreground': {'precision': 0.70795435, 'Recall': 0.77022797, 'F1 Score': 0.7377794104039289, 'iou': 0.58450913}, 'background': {'precision': 0.94187653, 'Recall': 0.92137486, 'F1 Score': 0.931512872531945, 'iou': 0.8718055}}"
I did not understand what you did there. I thought the purpose was for you to replicate my scenario to replicate my problem, but you go and run on something that I don’t even have access to:
In any case,
Somehow there is a discrepancy. If I run
tao unet run /bin/bash
I get 3,21,11
Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
But running
tao info
Configuration of the TAO Toolkit Instance
dockers: [‘nvidia/tao/tao-toolkit-tf’, ‘nvidia/tao/tao-toolkit-pyt’, ‘nvidia/tao/tao-toolkit-lm’]
format_version: 2.0
toolkit_version: 3.22.02
published_date: 02/28/2022
This seems like a dead end… Very frustrating to waste sooooo much time on this just to go around in an endless loop
To look into your issue, we need to check the gap between us. Even with default isbi notebook, there is gap on your side.
When I use latest TAO version(22.05) to try to reproduce the performance gap, but cannot reproduce.
Can you attach the .ipynb and also all the dataset which I can reproduce?
BTW,
For /workspace/demo_3.0/forum_repro/unet_isbi
it is just a folder on my side. You can ignore it.
I believe you are using an old version of TAO.
Please share the result of below command.
$ tao info --verbose
Please install the latest.
$ pip3 install nvidia-tao
Or for 22.05 docker, you can pull
nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.5-py3
Can you attach the tlt model you have trained with isbi notebook?
NO. Not open source
I will be reinstalling a new workstation soon. It seems to be the only way to isolate the problem, but this whole thing has been very very time consuming and I can’t freeze my project on this.
Understood. For the tlt model you have trained with isbi notebook, if you have kept it, you can share with me. It is trained on isbi dataset. So, there is not copyright issue. Then I can use it to check again in 22.05.
Anyway, I will check if there is issue in 21.11.
There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one.
Thanks
Hi,
Indeed, there is performance drop in 21.11 version.
In 21.11, the tlt model result:
"{'foreground': {'precision': 0.69601804, 'Recall': 0.7722573, 'F1 Score': 0.7321583749601505, 'iou': 0.5774841}, 'background': {'precision': 0.94207376, 'Recall': 0.91653854, 'F1 Score': 0.9291307369557293, 'iou': 0.8676416}}"
the tensorrt engine result:
"{'foreground': {'precision': 0.59434956, 'Recall': 0.6486686, 'F1 Score': 0.6203222234776844, 'iou': 0.44961384}, 'background': {'precision': 0.91104954, 'Recall': 0.89044565, 'F1 Score': 0.9006297727298221, 'iou': 0.81922334}}"
You can consider either of below solutions.
Solution 1 :
Dot not need to train tlt model in 22.05. Just need to use your 21.11 tlt model and run export under 22.05 docker. It will generate a new tensorrt engine. Use it and run evaluation again.
I confirm that there is no performance drop now.
"{'foreground': {'precision': 0.6960101, 'Recall': 0.77224094, 'F1 Score': 0.7321466222923124, 'iou': 0.57746947}, 'background': {'precision': 0.94206977, 'Recall': 0.91653717, 'F1 Score': 0.9291280902702932, 'iou': 0.867637}}"
Solution 2:
Use 22.05 version to do training/evaluation/etc.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.