Setup information
• Hardware Platform (Jetson / GPU) : Jetson Orin Nano
• DeepStream Version : DeepStream 6.3
• JetPack Version (valid for Jetson only) : Jetpack 5.1.3
• TensorRT Version : TensorRT 8.5.2.2
• NVIDIA GPU Driver Version (valid for GPU only) : N/A
• Issue Type( questions, new requirements, bugs) : Bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing) : See below for configurations.
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description) : N/A
Issue:
Fine-tuned TAO ClassificationTF2 TLT model (EfficientNet-B0 backbone) gives high inference accuracy, but accuracy dropped after converting the model to TensorRT engine and running inference in DeepStream as SGIE.
Evaluation Method:
Results were compared on the same video.
This is how we compare the TLT and TensorRT models:
- We used the same PGIE (PeopleNet) and tracker to perform detection and tracking.
- We cropped the objects from the video frames based on the bounding boxes in the KITTI tracker output file.
- We run
tao model classification_tf2 inference
and evaluate the results of the TLT model. - We run inference on the same video in DeepStream and TRTEXEC, and manually compare the results.
Accuracy Comparison:
Object | Ground Truth Class | TLT Accuracy | TensorRT Accuracy |
---|---|---|---|
Object_1 | Class_1 | 96.01% | 41.46% |
Object_2 | Class_1 | 97.60% | 9.36% |
Object_3 | Class_1 | 100% | 18.00% |
Object_4 | Class_2 | 100% | 100% |
Object_5 | Class_2 | 100% | 100% |
Notes:
- The TensorRT accuracy was obtained by running TRT inference using TRTExec. We manually inspect the DeepStream output video, the TRTEXEC results seem to align with the DeepStream inference overlay output video.
- It seems like the accuracy drop only affected
Class_1
, but notClass_2
.
TAO ClassificationTF2 Configuration
dataset:
train_dataset_path: "/workspace/tao-experiments/data/train"
val_dataset_path: "/workspace/tao-experiments/data/val"
preprocess_mode: 'torch'
num_classes: 2
augmentation:
enable_color_augmentation: True
train:
checkpoint: '/workspace/tao-experiments/pretrained_classification_tf2_vefficientnet_b0'
batch_size_per_gpu: 32
num_epochs: 100
optim_config:
optimizer: 'sgd'
lr_config:
scheduler: 'cosine'
learning_rate: 0.0005
soft_start: 0.05
reg_config:
type: 'L2'
scope: ['conv2d', 'dense']
weight_decay: 0.00005
results_dir: '/workspace/tao-experiments/results/train'
model:
backbone: 'efficientnet-b0'
input_width: 128
input_height: 128
input_channels: 3
dropout: 0.12
evaluate:
dataset_path: "/workspace/tao-experiments/data/test"
checkpoint: "/workspace/tao-experiments/results/train/efficientnet-b0_100.tlt"
top_k: 1
batch_size: 16
n_workers: 8
results_dir: '/workspace/tao-experiments/results/val'
inference:
image_dir: "/workspace/tao-experiments/data/test_images"
checkpoint: "/workspace/tao-experiments/results/train/efficientnet-b0_100.tlt"
results_dir: '/workspace/tao-experiments/results/inference'
classmap: "/workspace/tao-experiments/results/train/classmap.json"
export:
checkpoint: "/workspace/tao-experiments/results/train/efficientnet-b0_100.tlt"
onnx_file: '/workspace/tao-experiments/results/export/efficientnet-b0.onnx'
results_dir: '/workspace/tao-experiments/results/export'
How I converted the model from TLT to TensorRT engine:
- Convert from TLT to ONNX using
tao model classification_tf2 export
. This step was not performed on the Jetson device. We were using NVIDIA GeForce RTX 4090 GPU for model training and exporting. - Convert from ONNX to TensorRT. This was conducted on the Jetson Orin Nano device. We tried two methods: (i) deploy the TLT model to Deepstream directly and let DeepStream handle the TRT conversion implicitly; and (ii) use TRTEXEC to compile TensorRT engine. However, both methods give the same (bad) inference results.
DeepStream App Configuration Files:
[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#kitti-track-output-dir=tracker_output_folder
[tiled-display]
enable=1
rows=1
columns=1
width=1920
height=1080
gpu-id=0
[source0]
enable=1
type=2
num-sources=1
uri=file:///path/to/test/video/file.mp4
gpu-id=0
[streammux]
gpu-id=0
batch-size=1
batched-push-timeout=33333
width=1920
height=1080
[sink0]
enable=1
type=3
container=1
codec=1
enc-type=1
sync=0
bitrate=3000000
profile=0
output-file=/path/to/inference/overlay/video.mp4
source-id=0
[osd]
enable=1
gpu-id=0
border-width=3
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Arial
[primary-gie]
enable=1
plugin-type=0
gpu-id=0
batch-size=1
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
gie-unique-id=1
config-file=/opt/nvidia/deepstream/deepstream-6.3/samples/configs/tao_pretrained_models/nvinfer/config_infer_primary_peoplenet.txt
[secondary-gie0]
enable=1
plugin-type=0
gpu-id=0
batch-size=1
gie-unique-id=3
operate-on-gie-id=1
operate-on-class-ids=0
config-file=/path/to/config_infer_secondary_classificationtf2.txt
[tracker]
enable=1
tracker-width=480
tracker-height=288
ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
ll-config-file=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_tracker_NvDCF_perf.yml
gpu-id=0
display-tracking-id=1
[tests]
file-loop=0
config_infer_secondary_classificationtf2.txt
(we followed this guide)
[property]
gpu-id=0
# preprocessing_mode == 'torch'
net-scale-factor=0.017507
offsets=123.675;116.280;103.53
model-color-format=0
# model config
onnx-file=/path/to/efficientnet-b0.onnx
model-engine-file=/path/to/efficientnet-b0.onnx_b1_gpu0_fp32.engine
labelfile-path=/path/to/labels.txt
classifier-threshold=0.5
operate-on-class-ids=0
batch-size=1
network-mode=0
network-type=1
process-mode=2
secondary-reinfer-interval=0
gie-unique-id=3
We would like to know: (i) what causes the degradation in model accuracy?; and (ii) how could we minimize this performance gap between the TLT model and the TensorRT engine.