Fine-tuned TAO ClassificationTF2 Accuracy Drop after Compiling to TensorRT

junhui1 · May 14, 2024, 3:18am

Setup information
• Hardware Platform (Jetson / GPU) : Jetson Orin Nano
• DeepStream Version : DeepStream 6.3
• JetPack Version (valid for Jetson only) : Jetpack 5.1.3
• TensorRT Version : TensorRT 8.5.2.2
• NVIDIA GPU Driver Version (valid for GPU only) : N/A
• Issue Type( questions, new requirements, bugs) : Bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing) : See below for configurations.
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description) : N/A

Issue:
Fine-tuned TAO ClassificationTF2 TLT model (EfficientNet-B0 backbone) gives high inference accuracy, but accuracy dropped after converting the model to TensorRT engine and running inference in DeepStream as SGIE.

Evaluation Method:
Results were compared on the same video.
This is how we compare the TLT and TensorRT models:

We used the same PGIE (PeopleNet) and tracker to perform detection and tracking.
We cropped the objects from the video frames based on the bounding boxes in the KITTI tracker output file.
We run tao model classification_tf2 inference and evaluate the results of the TLT model.
We run inference on the same video in DeepStream and TRTEXEC, and manually compare the results.

Accuracy Comparison:

Object	Ground Truth Class	TLT Accuracy	TensorRT Accuracy
Object_1	Class_1	96.01%	41.46%
Object_2	Class_1	97.60%	9.36%
Object_3	Class_1	100%	18.00%
Object_4	Class_2	100%	100%
Object_5	Class_2	100%	100%

Notes:

The TensorRT accuracy was obtained by running TRT inference using TRTExec. We manually inspect the DeepStream output video, the TRTEXEC results seem to align with the DeepStream inference overlay output video.
It seems like the accuracy drop only affected Class_1, but not Class_2.

TAO ClassificationTF2 Configuration

dataset:
  train_dataset_path: "/workspace/tao-experiments/data/train"
  val_dataset_path: "/workspace/tao-experiments/data/val"
  preprocess_mode: 'torch'
  num_classes: 2
  augmentation:
    enable_color_augmentation: True
train:
  checkpoint: '/workspace/tao-experiments/pretrained_classification_tf2_vefficientnet_b0'
  batch_size_per_gpu: 32
  num_epochs: 100
  optim_config:
    optimizer: 'sgd'
  lr_config:
    scheduler: 'cosine'
    learning_rate: 0.0005
    soft_start: 0.05
  reg_config:
    type: 'L2'
    scope: ['conv2d', 'dense']
    weight_decay: 0.00005
  results_dir: '/workspace/tao-experiments/results/train'
model:
  backbone: 'efficientnet-b0'
  input_width: 128
  input_height: 128
  input_channels: 3
  dropout: 0.12
evaluate:
  dataset_path: "/workspace/tao-experiments/data/test"
  checkpoint: "/workspace/tao-experiments/results/train/efficientnet-b0_100.tlt"
  top_k: 1
  batch_size: 16
  n_workers: 8
  results_dir: '/workspace/tao-experiments/results/val'
inference:
  image_dir: "/workspace/tao-experiments/data/test_images"
  checkpoint: "/workspace/tao-experiments/results/train/efficientnet-b0_100.tlt"
  results_dir: '/workspace/tao-experiments/results/inference'
  classmap: "/workspace/tao-experiments/results/train/classmap.json"
export:
  checkpoint: "/workspace/tao-experiments/results/train/efficientnet-b0_100.tlt"
  onnx_file: '/workspace/tao-experiments/results/export/efficientnet-b0.onnx'
  results_dir: '/workspace/tao-experiments/results/export'

How I converted the model from TLT to TensorRT engine:

Convert from TLT to ONNX using tao model classification_tf2 export. This step was not performed on the Jetson device. We were using NVIDIA GeForce RTX 4090 GPU for model training and exporting.
Convert from ONNX to TensorRT. This was conducted on the Jetson Orin Nano device. We tried two methods: (i) deploy the TLT model to Deepstream directly and let DeepStream handle the TRT conversion implicitly; and (ii) use TRTEXEC to compile TensorRT engine. However, both methods give the same (bad) inference results.

DeepStream App Configuration Files:

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#kitti-track-output-dir=tracker_output_folder

[tiled-display]
enable=1
rows=1
columns=1
width=1920
height=1080
gpu-id=0

[source0]
enable=1
type=2
num-sources=1
uri=file:///path/to/test/video/file.mp4
gpu-id=0

[streammux]
gpu-id=0
batch-size=1
batched-push-timeout=33333
width=1920
height=1080

[sink0]
enable=1
type=3
container=1
codec=1
enc-type=1
sync=0
bitrate=3000000
profile=0
output-file=/path/to/inference/overlay/video.mp4
source-id=0

[osd]
enable=1
gpu-id=0
border-width=3
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Arial

[primary-gie]
enable=1
plugin-type=0
gpu-id=0
batch-size=1
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
gie-unique-id=1
config-file=/opt/nvidia/deepstream/deepstream-6.3/samples/configs/tao_pretrained_models/nvinfer/config_infer_primary_peoplenet.txt

[secondary-gie0]
enable=1
plugin-type=0
gpu-id=0
batch-size=1
gie-unique-id=3
operate-on-gie-id=1
operate-on-class-ids=0
config-file=/path/to/config_infer_secondary_classificationtf2.txt

[tracker]
enable=1
tracker-width=480
tracker-height=288
ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_perf.yml
gpu-id=0
display-tracking-id=1

[tests]
file-loop=0

config_infer_secondary_classificationtf2.txt
(we followed this guide)

[property]
gpu-id=0
# preprocessing_mode == 'torch'
net-scale-factor=0.017507
offsets=123.675;116.280;103.53
model-color-format=0

# model config
onnx-file=/path/to/efficientnet-b0.onnx
model-engine-file=/path/to/efficientnet-b0.onnx_b1_gpu0_fp32.engine
labelfile-path=/path/to/labels.txt
classifier-threshold=0.5
operate-on-class-ids=0
batch-size=1

network-mode=0
network-type=1
process-mode=2

secondary-reinfer-interval=0
gie-unique-id=3

We would like to know: (i) what causes the degradation in model accuracy?; and (ii) how could we minimize this performance gap between the TLT model and the TensorRT engine.

Morganh · May 14, 2024, 5:01am

In TAO, there is a tao-deploy docker for user to generate tensorrt engine and also run evaluation or inference. To narrow down, please use tao deploy docker to check the result.
You can use tao launcher(i.e., tao deploy xxx) or use docker run against (nvcr.io/nvidia/tao/tao-toolkit:5.3.0-deploy) directly.
See the end of tao_tutorials/notebooks/tao_launcher_starter_kit/classification_tf2/tao_voc/classification.ipynb at main · NVIDIA/tao_tutorials · GitHub or Classification (TF2) with TAO Deploy - NVIDIA Docs.

junhui1 · May 14, 2024, 9:29am

Thank you for the fast response.
I have tried TAO deploy as suggested. I used classification_tf2 gen_trt_engine to generate the TensorRT engine file, and used classification_tf2 inference to run inference on the test images.

However, the results are quite weird. ALL images are classified as the same class (“non-staff”) with same confidence (0.938832).
Here are the first few rows of result.csv:

Accuracy:

Class	TLT Accuracy	TensorRT Accuracy
non-staff	97.79%	100%
staff	97.57%	0%

Configuration Files

gen_trt_engine config file:

gen_trt_engine:
  onnx_file: '/path/to/efficientnet-b0.onnx'
  trt_engine: '/path/to/efficientnet-b0_batch64.engine'
  results_dir: '/path/to/gen_trt_results'
  tensorrt:
    max_workspace_size: 4
    max_batch_size: 64
    data_type: "fp32"

Inference config file:

dataset:
  augmentation:
    enable_color_augmentation: true
  num_classes: 2
  preprocess_mode: torch
inference:
  trt_engine: /path/to/efficientnet-b0_batch64.engine
  classmap: /path/to/classmap.json
  image_dir: /path/to/test/images
  results_dir: /path/to/inference-results
model:
  backbone: efficientnet-b0
  input_channels: 3
  input_height: 128
  input_width: 128

Morganh · May 14, 2024, 2:42pm

So, your result shows that the tao deploy inference or tao deploy evaluation also get incorrect result.
To narrow down, could you run default notebook to see if it works?
tao_tutorials/notebooks/tao_launcher_starter_kit/classification_tf2/tao_voc/classification.ipynb at main · NVIDIA/tao_tutorials · GitHub.

Please refer to the default spec file. tao_tutorials/notebooks/tao_launcher_starter_kit/classification_tf2/tao_voc/specs/spec.yaml at main · NVIDIA/tao_tutorials · GitHub. For example, there is enable_center_crop: True.

junhui1 · May 15, 2024, 8:49am

I have tried the default notebook and default specs (e.g., enable_center_crop: True) as suggested. Here are the results:

TAO TLT Model Evaluation Results
tao model classification_tf2 evaluate

Evaluation Loss: 0.1649947166442871
Evaluation Top 1 accuracy: 0.9825870394706726

Confusion Matrix:

[[589  17]
 [  4 596]]

Classification Report:

              precision    recall  f1-score   support

   non-staff       0.99      0.97      0.98       606
       staff       0.97      0.99      0.98       600

    accuracy                           0.98      1206
   macro avg       0.98      0.98      0.98      1206
weighted avg       0.98      0.98      0.98      1206

TAO Deploy TensorRT Engine Evaluation Results
tao deploy classification_tf2 evaluate

Top 1 scores: 0.495

Confusion Matrix:

[[  0 606]
 [  0 594]]

Classification Report:

              precision    recall  f1-score   support

   non-staff       0.00      0.00      0.00       606
       staff       0.49      1.00      0.66       594

    accuracy                           0.49      1200
   macro avg       0.25      0.50      0.33      1200
weighted avg       0.25      0.49      0.33      1200

Morganh · May 16, 2024, 2:30am

From your result, there is still accuracy drop when run “tao deploy” evaluation against the tensorrt engine. Will check further. BTW, did you ever run with default dataset?

Morganh · May 16, 2024, 3:14am

From tao-tf2 branch tao_tensorflow2_backend/nvidia_tao_tf2/cv/classification/inferencer/keras_inferencer.py at main · NVIDIA/tao_tensorflow2_backend · GitHub and tao-deploy branch tao_deploy/nvidia_tao_deploy/cv/classification_tf1/dataloader.py at main · NVIDIA/tao_deploy · GitHub, could you please set the same interpolation_method explicitly in spec yaml file and retry?

junhui1 · May 16, 2024, 12:06pm

From your result, there is still accuracy drop when run “tao deploy” evaluation against the tensorrt engine. Will check further.

Thanks! I am happy to share the TLT model and TensorRT engine with you if that helps.

BTW, did you ever run with default dataset?

No, I did not.

junhui1 · May 17, 2024, 2:15am

I have specified the same interpolation method in the spec YAML file for both tao model classification_tf2 evaluate and tao deploy classification_tf2 evaluate:

model:
  ...
  resize_interpolation_method: bilinear

It does not help.

Morganh · May 17, 2024, 2:24am

OK, thanks for the info. I will check further and update to you if I have. Thanks.

Morganh · May 23, 2024, 2:10am

I can reproduce the accuracy drop. Will check further.

junhui1 · May 27, 2024, 7:52am

Thank you.
May I know when we will get an update? Our production relies on this.
If it cannot be resolved soon, what alternatives could we use? Will reverting to an older version of the TAO toolkit resolve the accuracy drop issue? Is there another solution provided by NVIDIA that we can use instead?

Morganh · May 27, 2024, 7:55am

Got it. We are still working on that. Once there is a fix or workaround, I will update to you.
Sorry for inconvenience.

Morganh · May 28, 2024, 9:16am

Hi, I find the root cause. Please use below way to change code.

$ docker run --runtime=nvidia -it --rm nvcr.io/nvidia/tao/tao-toolkit:5.0.0-deploy /bin/bash

Then inside the docker,
# mv /usr/local/lib/python3.10/dist-packages/nvidia_tao_deploy/inferencer/preprocess_input.py /usr/local/lib/python3.10/dist-packages/nvidia_tao_deploy/inferencer/preprocess_input.py.bak
# vim /usr/local/lib/python3.10/dist-packages/nvidia_tao_deploy/inferencer/preprocess_input.py
(Copy the content from tao_deploy/nvidia_tao_deploy/inferencer/preprocess_input.py at main · NVIDIA/tao_deploy · GitHub)

Modify tao_deploy/nvidia_tao_deploy/inferencer/preprocess_input.py at main · NVIDIA/tao_deploy · GitHub
to

override_mean = True

Then run evaluation to confirm there is no gap now.

# classification_tf2 evaluate -e xxx.yaml

You can also run docker commit to generate a new tao-deploy docker.

junhui1 · May 29, 2024, 8:15am

Thank you for the fast action.
I was able to close the gap by following the method you suggested. However, this approach only addresses the gap for TAO deploy. I am interested in utilizing this model in DeepStream.
Could you please advise on how I can apply this fix to ensure there is no performance gap when using the TRT engine in DeepStream?

Morganh · May 29, 2024, 9:05am

Please set to

net-scale-factor=0.0175070028011204
offsets=2.165178571428571;2.035714285714286;1.8125

According to tao_tensorflow2_backend/nvidia_tao_tf2/cv/classification/utils/preprocess_input.py at main · NVIDIA/tao_tensorflow2_backend · GitHub, (1/255 - mean)/std = 1/255/std - mean/std

junhui1 · May 29, 2024, 10:16am

I specified the values that you suggested, but it didn’t make a difference. The resulting inference overlay video looks similar to the previous one.

Morganh · May 29, 2024, 11:26am

Please delete engine and generate again. Just to make sure you are using a new engine instead of running with existing engine.

junhui1 · May 30, 2024, 1:27am

I deleted the old engine, regenerated a new engine, and specified the net-scale-factor and offsets you suggested. It didn’t make a difference.

Morganh · May 30, 2024, 2:29am

When you run tao model classification_tf2 inference, you are using the cropped images, right?

Currently, your pipeline is peoplenet → classification.
To narrow down, could you use the cropped images and run classification directly? In other words, just want to work as primary trt engine.
Similar topic is shared in
Issue with image classification tutorial and testing with deepstream-app - #21 by Morganh.

More, two modifications can be considered.
1)Please generate .avi file based on the cropped image.
$ gst-launch-1.0 multifilesrc location=“/tmp/%d.jpg” caps=“image/jpeg,framerate=30/1” ! jpegdec ! x264enc ! avimux ! filesink location=“out.avi”
Refer to Issue with image classification tutorial and testing with deepstream-app - #24 by Morganh
2) Set scaling-filter=5. Refer to Issue with image classification tutorial and testing with deepstream-app - #32 by Morganh

Topic		Replies	Views
Custom TAO unet model classifying only two classes on Deepstream! TAO Toolkit	34	1691	May 12, 2022
Deepstream infrence gives no detection TAO Toolkit	28	1927	December 9, 2021
Issue with image classification tutorial and testing with deepstream-app TAO Toolkit tensorrt , jetson-inference	34	5750	October 12, 2021
Integrating Tao Models (detectnet_v2) into Deepstream SDK TAO Toolkit tao , deepstream , jetson-nano	11	956	March 24, 2023
Deploy TAO Classification_pyt FAN for Jetson Nano TAO Toolkit tensorrt , jetson-inference , tao , deepstream	15	353	April 8, 2024
Not able to deploy .etlt file in deepstream test app 1 TAO Toolkit	12	1818	October 12, 2021
Issues with tao classifier_tf2 in deepstream (Accuracy drops) TAO Toolkit deepstream	21	42	September 6, 2024
Error loading model TAO 4.0 and DeepStream Python Apps DeepStream SDK tensorrt , python	23	1203	February 1, 2023
deepstream-plugins make failed DeepStream SDK	36	4766	September 7, 2022
Mask Obtained from Deepstream are not same as TAO inferecing Output DeepStream SDK tensorrt , gstreamer , deepstream	2	712	October 13, 2022

Fine-tuned TAO ClassificationTF2 Accuracy Drop after Compiling to TensorRT

Related topics