I have fine tuned the classification_tf2 network from tao toolkit. I can generate the trt engine normally and when I evaluate it the results are satisfcatory. However, when I use the model in deepstream the results differ to the ones i got running inference in tao deploy . I am aware the engine must be generated according to the GPU, that is why i generate a new engine in deepstream.
This is my config files in deepstream 6.3:
The deepstream app:
[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
[tiled-display]
enable=1
rows=1
columns=1
width=640
height=360
gpu-id=0
nvbuf-memory-type=0
[source0]
enable=1
type=3
uri=file:///root/top/Downloads/vid10_36.mkv
num-sources=1
gpu-id=0
cudadec-memtype=0
[sink0]
enable=1
type=2
sync=0
source-id=0
gpu-id=0
nvbuf-memory-type=0
[osd]
enable=1
gpu-id=0
border-width=1
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0
[streammux]
gpu-id=0
live-source=0
batch-size=1
batched-push-timeout=40000
width=2688
height=1520
enable-padding=1
nvbuf-memory-type=0
[primary-gie]
enable=1
gpu-id=0
batch-size=1
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
interval=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=config.txt
#The config file:
[property]
gpu-id=0
net-scale-factor=0.017507
offsets=123.675;116.280;103.53
model-color-format=0
batch-size= 1
onnx-file=/root/top/experiments_final3/efficientnet-b0_013.onnx
labelfile-path=/root/top/experiments_final3/labels.txt
model-engine-file=/root/top/experiments_final3/efficientnet-b0_013.onnx_b1_gpu0_int8.engine
int8-calib-file=/root/top/experiments_final3/cal.bin
infer-dims=3;256;256
uff-input-blob-name=input_1
output-blob-names=Identity:0
process-mode=1
network-mode=1
network-type=1
num-detected-classes=3
interval=0
gie-unique-id=1
classifier-async-mode=1
classifier-threshold=0.2
##The training spec file:
dataset:
train_dataset_path: “/home/data/train”
val_dataset_path: “/home/data/val”
preprocess_mode: ‘torch’
num_classes: 3
augmentation:
enable_center_crop: False
enable_random_crop: False
disable_horizontal_flip: True
enable_color_augmentation: False
mixup_alpha: 0
train:
qat: False
checkpoint: ‘’
batch_size_per_gpu: 32
num_epochs: 200
optim_config:
optimizer: ‘sgd’
lr_config:
scheduler: ‘cosine’
learning_rate: 0.0005
soft_start: 0.05
reg_config:
type: ‘L2’
scope: [‘conv2d’, ‘dense’]
weight_decay: 0.00005
results_dir: ‘/home/experiments_final_3/train’
model:
backbone: ‘efficientnet-b0’
input_width: 256
input_height: 256
input_channels: 3
evaluate:
dataset_path: “/home/data/test”
checkpoint: “/home/experiments_final_3/train/efficientnet-b0_010.tlt”
top_k: 1
batch_size: 16
n_workers: 8
results_dir: ‘/home/machukamendosk/experiments_final_3/evaluation’
export:
checkpoint: “/home/experiments_final_3/train/efficientnet-b0_013.tlt”
onnx_file: ‘/home/experiments_final_3/export/efficientnet-b0_013.onnx’
results_dir: ‘/home/experiments_final_3/export’
inference:
checkpoint: ‘’
trt_engine: ‘/home/experiments_final_3/export/efficientnet-b0_013.int8.engine’
image_dir: ‘/home/data/inference1’
classmap: ‘/home/experiments_final_3/train/classmap.json’
results_dir: ‘/home/experiments_final_3/inference1’
gen_trt_engine:
onnx_file: ‘/home/experiments_final_3/export/efficientnet-b0_013.onnx’
trt_engine: ‘/home/experiments_final_3/export/efficientnet-b0_013.int8.engine’
results_dir: ‘/home/experiments_final_3/export’
tensorrt:
data_type: “int8”
max_workspace_size: 4
max_batch_size: 16
calibration:
cal_image_dir: ‘/home/data/val’
cal_data_file: ‘/home/experiments_final_3/export/calib.tensorfile’
cal_cache_file: ‘/home/experiments_final_3/export/cal.bin’
cal_batches: 20
I see the preprocessing is pretty important. But i can not fugure it out what is wrong.
I would like to add that i am using tao-toolkit:5.3.0-deploy for engine genration and inference and tao-toolkit:5.0.0-tf2.11.0 for training
Morganh
September 4, 2024, 8:00am
3
Please set to below and retry.
net-scale-factor=0.0175070028011204
offsets=123.675;116.28;103.53
Because in deepstream, y = scale_factor * (x- mean) .
In tao-tf2 torch mode, y = (x/255 - torch_mean)/std = (x - 255 * torch_mean) * (1/255/std) .
So, scale_factor = 1/255/std.
The mean = 255 * torch_mean
I retried but the results haven’t changed. In inference and deepstream the images belong to different classes. It is quite strange.
Morganh
September 4, 2024, 8:26am
5
For the same frame, if tao-deploy inference to A class, deepstream inference to B class, you can check the label.txt file to check the order.
The label.txt file is correct. I mean not always belong to different classes. There are frames that are classified properly, but a significant number of them are not classified as it should be. (In inference the classification is better than in deepstream)
I have ensured that the preprocessing is the same, the labels are in correct order, the sizes are correct.
Morganh
September 4, 2024, 8:34am
8
Please run fp32 mode to check if it is the same as tao-deploy.
More info can be found in Fine-tuned TAO ClassificationTF2 Accuracy Drop after Compiling to TensorRT - #31 by Morganh as well.
I have read that topic during the last days, and I see that preprocessing plays a big role. I ran fp32 mode as well, and the results are still not satisfactory. The calssification performance in deepstream is still poor compared to the classificaiton performance in tao deploy (inference). I have tried different preprocessing parameters but I can not figure it outwhy the accuracy drops.
Morganh
September 4, 2024, 9:17am
10
To narrow down, please use below trtexec command to generate fp32 engine inside tao-deploy docker and deepstream docker.
trtexec --onnx=/path/to/model.onnx \
--maxShapes=input_1:1x3x256x256 \
--minShapes=input_1:1x3x256x256 \
--optShapes=input_1:1x3x256x256 \
--saveEngine=fp32.engine
In tao-deploy docker, run evaluation against it.
In deepstream docker, not let deepstream generate engine, just to make sure deepstream use the engine you generated via trtexec.
And compare again.
More experiment is that, please generate mp4 file instead.
i have followed your intructions. First when I use trtexec in tao-deploy docker, when i run inference the results are quite good. After that i use trtexec in deepstream, during inference in deepstream the results are not satisfactory. I tried to use the engine generated in tao-deploy but deepstream shows “deserialize backend context from engine from file :/root/top/Tao_deploy_here/efficientnet-b0_013_1.fp32.engine failed, try rebuild” and rebuilds a new engine. I try to run inference in tao-deploy using the engine generated in deepstream but i get this error: AttributeError: ‘NoneType’ object has no attribute ‘create_execution_context’
Morganh
September 4, 2024, 10:20am
12
cristian.machuca.mendoza:
First when I use trtexec in tao-deploy docker, when i run inference the results are quite good. After that i use trtexec in deepstream, during inference in deepstream the results are not satisfactory.
More experiments here.
exp1:
Please generate .avi file and .mp4 file and retry.
$ gst-launch-1.0 multifilesrc location=“/tmp/%d.jpg” caps=“image/jpeg,framerate=30/1” ! jpegdec ! x264enc ! avimux ! filesink location=“out.avi”
$ apt-get install ffmpeg
$ ffmpeg -framerate 2 -pattern_type glob -i ‘*.jpg’ -c:v libx264 -pix_fmt yuv420p -vf “crop=trunc(iw/2)*2:trunc(ih/2)*2” out.mp4
exp2:
Please refer to the config_as_primary_gie.txt in Fine-tuned TAO ClassificationTF2 Accuracy Drop after Compiling to TensorRT - #31 by Morganh .
config.txt (940 Bytes)
deepstream_app.txt (2.4 KB)
These are my config files in deepstream. I have converted the video as well. The results continue to be unsatisfactory. I ahve tried different parameters for example: maintain-aspect-ratio=0 and 1. But it does not work out. I would like to add that i ahve followed all the recomendations given here: Fine-tuned TAO ClassificationTF2 Accuracy Drop after Compiling to TensorRT - TAO Toolkit - NVIDIA Developer Forums . In the spec file I have not used augmentation as recommended.
Morganh
September 5, 2024, 5:06am
14
How did you train the model? Did the train use center_crop?
Here is the training spec file. i did no use center crop:
dataset:
train_dataset_path: “/home/data/train”
val_dataset_path: “/home/data/val”
preprocess_mode: ‘torch’
num_classes: 3
augmentation:
enable_center_crop: False
enable_random_crop: False
disable_horizontal_flip: True
enable_color_augmentation: False
mixup_alpha: 0
train:
qat: False
checkpoint: ‘’
batch_size_per_gpu: 32
num_epochs: 200
optim_config:
optimizer: ‘sgd’
lr_config:
scheduler: ‘cosine’
learning_rate: 0.0005
soft_start: 0.05
reg_config:
type: ‘L2’
scope: [‘conv2d’, ‘dense’]
weight_decay: 0.00005
results_dir: ‘/home/experiments_final_3/train’
model:
backbone: ‘efficientnet-b0’
input_width: 256
input_height: 256
input_channels: 3
evaluate:
dataset_path: “/home/data/test”
checkpoint: “/home/train/efficientnet-b0_010.tlt”
top_k: 1
batch_size: 16
n_workers: 8
results_dir: ‘/home/experiments_final_3/evaluation’
export:
checkpoint: “/home/train/efficientnet-b0_013.tlt”
onnx_file: ‘/home/export/efficientnet-b0_013.onnx’
results_dir: ‘/home/experiments_final_3/export’
inference:
checkpoint: ‘’
trt_engine: ‘/home/export/efficientnet-b0_013.int8.engine’
image_dir: ‘/home/data/inference1’
classmap: ‘/home/experiments_final_3/train/classmap.json’
results_dir: ‘/home/experiments_final_3/inference1’
gen_trt_engine:
onnx_file: ‘/home/experiments_final_3/export/efficientnet-b0_013.onnx’
trt_engine: ‘/home/export/efficientnet-b0_013.int8.engine’
results_dir: ‘/home/experiments_final_3/export’
tensorrt:
data_type: “int8”
max_workspace_size: 4
max_batch_size: 16
calibration:
cal_image_dir: ‘/home/data/val’
cal_data_file: ‘/home/experiments_final_3/export/calib.tensorfile’
cal_cache_file: ‘/home/experiments_final_3/export/cal.bin’
cal_batches: 20
@Morganh I have the same issue even using classifcation_tf1. In deepstream the performance is reduced.
Morganh
September 5, 2024, 2:17pm
17
Classification_tf2 should have no issue. May I know which deepstream docker you are running?
I tried in 7.0-samples-multiarch, 7.0-triton-multiarch, and 6.3-samples. In the three i have the same issue. Actually i tried with other data in case my dataset was problematic (i trained using the dataset cats and dogs), and also got erros. Inference in tao deploy 5.5.0 is more accurate than in deepstream. These are all my files:
config_file_deepstream.txt (1.1 KB)
deepstream_app.txt (2.5 KB)
engine.txt (279 Bytes)
evaluate_engine.txt (274 Bytes)
export.txt (273 Bytes)
inference.txt (275 Bytes)
labels.txt (7 Bytes)
train.txt (272 Bytes)
train_spec.txt (2.2 KB)
Morganh
September 5, 2024, 2:39pm
19
I suggest you to check if the preprocessing is the same between tao-deploy and deepstream.
In tao-deploy, you can add debug code in tao_deploy/nvidia_tao_deploy/cv/classification_tf2/scripts/inference.py at 31c7e0ed3fe48942c254b3b85517e7418eea17b3 · NVIDIA/tao_deploy · GitHub and tao_deploy/nvidia_tao_deploy/cv/classification_tf1/dataloader.py at 31c7e0ed3fe48942c254b3b85517e7418eea17b3 · NVIDIA/tao_deploy · GitHub to save images after preprocessing. Similar to Fine-tuned TAO ClassificationTF2 Accuracy Drop after Compiling to TensorRT - #35 by Morganh . This will help you understand the preprocessing based on your training spec file.
Also, you can leverage the tao-deploy code to generate a standalone inference code to run inference against the tensorrt engine.
Check if your standalone inference can get the expected result.
Thanks @Morganh for the answer. I am looking into it. I have got a couple of questions. First, is it possible to save the preprocessed images from deepstream. Second, how can I disable the preprocessing in deepstream so I can feed the preprocessed images saved from tao deploy to the trt engine?