Tlt resnet18 performance drop between .tlt inference and .engine

ok I will try the suggestions above thank you. I trained the tlt model with RGB images so I am using model-color-format=0. Is this correct? Also setting offset to 103.939;116.779;123.68 with RGB?

Thank you

Please set to BGR configuration.
model-color-format=1

Also please set to below
offsets=103.939;116.779;123.68

Reference: comment 21 of Issue with image classification tutorial and testing with deepstream-app - #21 by Morganh

those parameters helped a bit but model performance is still not quite what it should be. The resnet I trained has dimensions 3,144,256 (c,h,w). Do I need to set maintain-aspect-ratio=1?

Not needed.

Do you ever set

  • Set “scaling-filter=5”

and if possible, please

  • Generate avi file with gstreamer

Please share your latest config file.

this is my latest config file for the secondary model:
config.txt (793 Bytes)

Sorry what do you mean by avi file? I can convert the mp4 video of inference with tracker bboxes to avi? Or did you mean something else?

See Issue with image classification tutorial and testing with deepstream-app - #24 by Morganh

gst-launch-1.0 multifilesrc location="/tmp/%d.jpg" caps=“image/jpeg,framerate=30/1” ! jpegdec ! x264enc ! avimux ! filesink location=“out.avi”

The avi file is better than mp4 file for inference.

More, in other topic mentioned above, the end user can run inference well with TLT classification model. So, please refer to the config file https://forums.developer.nvidia.com/uploads/short-url/rk4x7xqir6N1nl3QpfxBcTTE6FA.txt in Issue with image classification tutorial and testing with deepstream-app - #21 by Morganh to narrow down.
For example, process-mode=1 etc.

sorry can you describe the process to make the avi file? I can run the command you sent but do I need a folder of images (cropouts)?

Yes, for below way, it will generate avi file from jpg files.
gst-launch-1.0 multifilesrc location="/tmp/%d.jpg" caps=“image/jpeg,framerate=30/1” ! jpegdec ! x264enc ! avimux ! filesink location=“out.avi”

ok I will try this. My cropouts are all different sizes though- am I supposed to resize and pad them to the same size to make the .avi? If so how? bilinear interpolation and then pad bottom right?

You can generate jpg files via ffmpeg.
$ ffmpeg -i xxx.mp4 folder/%d.jpg

It is not needed to resize/pad.

what is this .avi file for? I already have a video to run inference on. I thought the idea was to make an .avi video consisting of cropouts so that I can run classification as a primary model?

Also- after primary inference the cropouts that are sent to the secondary model- what are they supposed to look like? Resized and padded bottom right?

If you already have avi file, please directly use it. I thought you have mp4 file only.
Can you follow Issue with image classification tutorial and testing with deepstream-app - #21 by Morganh to run inference with only one GIE. In this case, there is not 2nd gie.

hello, I ran the classification resnet18 tlt on the avi file (as the only GIE) and I get bad performance. What do you think could be the issue?

Also what are the processing steps between the primary and secondary GIE normally? Can you describe what happens with the bbox cropouts please?

Thanks

Firstly, please make sure tlt classification inference can run well. Please double check, and try to run more test images. If it is good, that means your tlt model can run inference well against the test image.

Then you can export this tlt model to etlt model, and run inference with this etlt model in deepstream. As we synced above, pay attention to the config file. You can just use primary GIE only. It will detect the whole test image (process-mode=1) . In this case, it did not crop bboxes.

when I try to run the .etlt model I get this error:
Linking elements in the Pipeline

linking recording pipeline
Opening in BLOCKING MODE
Opening in BLOCKING MODE
Opening in BLOCKING MODE
Opening in BLOCKING MODE
Starting pipeline

Opening in BLOCKING MODE
Opening in BLOCKING MODE
Opening in BLOCKING MODE
Opening in BLOCKING MODE
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream/lib/libnvds_nvdcf.so
gstnvtracker: Batch processing is ON
gstnvtracker: Past frame output is ON
[NvDCF][Warning] minTrackingConfidenceDuringInactive is deprecated
[NvDCF] Initialized
0:00:01.951649466 21660 0x558f83a950 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1716> [UID = 1]: Trying to create engine from model files
ERROR: Uff input blob name is empty
ERROR: Failed to create network using custom network creation function
ERROR: Failed to get cuda engine from custom library API
0:00:03.594655486 21660 0x558f83a950 ERROR nvinfer gstnvinfer.cpp:613:gst_nvinfer_logger: NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1736> [UID = 1]: build engine file failed
Segmentation fault (core dumped)

Can you share the full command, config files?

I can confirm that tlt classification inference runs well.
I managed to run deepstream with the .etlt model but the performance is not good on my test video. What could be the reason for this? My config is as follows:
[property]

gpu-id=0

net-scale-factor=1.0

model-color-format=1

offsets=103.939;116.779;123.68

num-detected-classes=13

output-blob-names=predictions/Softmax

#model-engine-file=path/to/engine

tlt-encoded-model=path/to/etlt

tlt-model-key=mykey

labelfile-path=path/to/labels

network-mode=2

process-mode=1

gie-unique-id=1

operate-on-gie-id=1

classifier-async-mode=0

classifier-threshold=0.1

interval=0

batch-size=16

scaling-filter=5

network-type=1

workspace-size=4096

infer-dims=3;144;256

maintain-aspect-ratio=0

enable-dla=1

use-dla-core=0

uff-input-blob-name=input_1

[class-attrs-all]

If possible, can you share your tlt, etlt model along with the avi video file, test images?

yes but I can’t post it here- can I email you?