Tlt resnet18 performance drop between .tlt inference and .engine

• Jetson NX
• Classification Resnet18
• TLT Version 3.0
• Training spec file:
model_config {
arch: “resnet”
n_layers: 18
use_batch_norm: true
all_projections: true
freeze_blocks: 0
freeze_blocks: 1
input_image_size: “3,144,256”
}
train_config {
train_dataset_path: “/workspace/tlt-experiments/data/train”
val_dataset_path: “/workspace/tlt-experiments/data/val”
pretrained_model_path: “/workspace/tlt-experiments/results/resnet_022_PRUNED.tlt”
optimizer {
sgd {
lr: 0.01
decay: 0.0
momentum: 0.9
nesterov: False
}
}
batch_size_per_gpu: 64
n_epochs: 200
n_workers: 16
preprocess_mode: “caffe”
enable_random_crop: True
enable_center_crop: True
label_smoothing: 0.0
mixup_alpha: 0.1
reg_config {
type: “L2”
scope: “Conv2D,Dense”
weight_decay: 0.00005
}
lr_config {
step {
learning_rate: 0.006
step_size: 10
gamma: 0.1
}
}
}
eval_config {
eval_dataset_path: “/workspace/tlt-experiments/data/test”
model_path: “/workspace/tlt-experiments/results/weights/resnet_074.tlt”
top_k: 3
batch_size: 256
n_workers: 8
enable_center_crop: True
}

Hello,

I trained a classification model using tlt to be used as a secondary model in deepstream. I followed the documentation and get 97% accuracy on my test set. I then used tlt-converter to make an engine file (for nx) and when I run this model I only get about 20% accuracy (basically I think the model is just guessing). Also, interestingly, my confidences are 0.8 and above with the .tlt but they drop to around 0.4 with the engine.

For the training data I cropped out the objects of interest from the images and trained the model on the cropouts. I did not resize or pad the images as the dataloader should take care of this?

I think that the problem is that when the engine is used as a secondary classifier the images it sees are very different from the ones the model was trained on. To debug this, can you give me more details on:

  • I am training with preprocess mode “caffe”, what does this do? Do I need to set anything special in deepstream to emulate it? Does it work in RGB or BGR? Would this make a difference?
  • there is a parameter called “offsets” = Array of mean values of color components to be subtracted from each pixel in deepstream. How do I find out what the mean subtraction values should be? Does this postprocessing step of the cropouts impact what the images at training should look like? What are the default values for mean subtraction at training time?
  • in what order are the cropouts postprocessed by deepstream by the pgie? crop → mean subtraction → normalization → biliear resize → padding (only bottom right)? What colour is the padding? Is the color consistent in tlt training dataloader and in deepstream?

For TLT classification model inference, there are 3 methods.

1st is : tlt classification inference. You already mention that it is running well.

2nd is : standalone python inference. I made some modification based on one customer’s code. See Inferring resnet18 classification etlt model with python . This end user can get the same result as tlt infer

3rd is: run inference with deepstream. Please see the solution (comment 21, 24, 32) in Issue with image classification tutorial and testing with deepstream-app

  Main change:
 -	Please the offset to 103.939;116.779;123.68
 -	Generate avi file with gstreamer
 -	Set “scaling-filter=5”

ok I will try the suggestions above thank you. I trained the tlt model with RGB images so I am using model-color-format=0. Is this correct? Also setting offset to 103.939;116.779;123.68 with RGB?

Thank you

Please set to BGR configuration.
model-color-format=1

Also please set to below
offsets=103.939;116.779;123.68

Reference: comment 21 of Issue with image classification tutorial and testing with deepstream-app - #21 by Morganh

those parameters helped a bit but model performance is still not quite what it should be. The resnet I trained has dimensions 3,144,256 (c,h,w). Do I need to set maintain-aspect-ratio=1?

Not needed.

Do you ever set

  • Set “scaling-filter=5”

and if possible, please

  • Generate avi file with gstreamer

Please share your latest config file.

this is my latest config file for the secondary model:
config.txt (793 Bytes)

Sorry what do you mean by avi file? I can convert the mp4 video of inference with tracker bboxes to avi? Or did you mean something else?

See Issue with image classification tutorial and testing with deepstream-app - #24 by Morganh

gst-launch-1.0 multifilesrc location=“/tmp/%d.jpg” caps=“image/jpeg,framerate=30/1” ! jpegdec ! x264enc ! avimux ! filesink location=“out.avi”

The avi file is better than mp4 file for inference.

More, in other topic mentioned above, the end user can run inference well with TLT classification model. So, please refer to the config file https://forums.developer.nvidia.com/uploads/short-url/rk4x7xqir6N1nl3QpfxBcTTE6FA.txt in Issue with image classification tutorial and testing with deepstream-app - #21 by Morganh to narrow down.
For example, process-mode=1 etc.

sorry can you describe the process to make the avi file? I can run the command you sent but do I need a folder of images (cropouts)?

Yes, for below way, it will generate avi file from jpg files.
gst-launch-1.0 multifilesrc location=“/tmp/%d.jpg” caps=“image/jpeg,framerate=30/1” ! jpegdec ! x264enc ! avimux ! filesink location=“out.avi”

ok I will try this. My cropouts are all different sizes though- am I supposed to resize and pad them to the same size to make the .avi? If so how? bilinear interpolation and then pad bottom right?

You can generate jpg files via ffmpeg.
$ ffmpeg -i xxx.mp4 folder/%d.jpg

It is not needed to resize/pad.

what is this .avi file for? I already have a video to run inference on. I thought the idea was to make an .avi video consisting of cropouts so that I can run classification as a primary model?

Also- after primary inference the cropouts that are sent to the secondary model- what are they supposed to look like? Resized and padded bottom right?

If you already have avi file, please directly use it. I thought you have mp4 file only.
Can you follow Issue with image classification tutorial and testing with deepstream-app - #21 by Morganh to run inference with only one GIE. In this case, there is not 2nd gie.

hello, I ran the classification resnet18 tlt on the avi file (as the only GIE) and I get bad performance. What do you think could be the issue?

Also what are the processing steps between the primary and secondary GIE normally? Can you describe what happens with the bbox cropouts please?

Thanks

Firstly, please make sure tlt classification inference can run well. Please double check, and try to run more test images. If it is good, that means your tlt model can run inference well against the test image.

Then you can export this tlt model to etlt model, and run inference with this etlt model in deepstream. As we synced above, pay attention to the config file. You can just use primary GIE only. It will detect the whole test image (process-mode=1) . In this case, it did not crop bboxes.

when I try to run the .etlt model I get this error:
Linking elements in the Pipeline

linking recording pipeline
Opening in BLOCKING MODE
Opening in BLOCKING MODE
Opening in BLOCKING MODE
Opening in BLOCKING MODE
Starting pipeline

Opening in BLOCKING MODE
Opening in BLOCKING MODE
Opening in BLOCKING MODE
Opening in BLOCKING MODE
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream/lib/libnvds_nvdcf.so
gstnvtracker: Batch processing is ON
gstnvtracker: Past frame output is ON
[NvDCF][Warning] minTrackingConfidenceDuringInactive is deprecated
[NvDCF] Initialized
0:00:01.951649466 21660 0x558f83a950 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1716> [UID = 1]: Trying to create engine from model files
ERROR: Uff input blob name is empty
ERROR: Failed to create network using custom network creation function
ERROR: Failed to get cuda engine from custom library API
0:00:03.594655486 21660 0x558f83a950 ERROR nvinfer gstnvinfer.cpp:613:gst_nvinfer_logger: NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1736> [UID = 1]: build engine file failed
Segmentation fault (core dumped)

Can you share the full command, config files?

I can confirm that tlt classification inference runs well.
I managed to run deepstream with the .etlt model but the performance is not good on my test video. What could be the reason for this? My config is as follows:
[property]

gpu-id=0

net-scale-factor=1.0

model-color-format=1

offsets=103.939;116.779;123.68

num-detected-classes=13

output-blob-names=predictions/Softmax

#model-engine-file=path/to/engine

tlt-encoded-model=path/to/etlt

tlt-model-key=mykey

labelfile-path=path/to/labels

network-mode=2

process-mode=1

gie-unique-id=1

operate-on-gie-id=1

classifier-async-mode=0

classifier-threshold=0.1

interval=0

batch-size=16

scaling-filter=5

network-type=1

workspace-size=4096

infer-dims=3;144;256

maintain-aspect-ratio=0

enable-dla=1

use-dla-core=0

uff-input-blob-name=input_1

[class-attrs-all]