Fine-tuned TAO ClassificationTF2 Accuracy Drop after Compiling to TensorRT

I encountered the following error when generating the AVI file using the command you shared.

ERROR: ld.so: object '/usr/local/lib/python3.8/dist-packages/scikit_learn.libs/libgomp-d22c30c5.so.1.0.0' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
Redistribute latency...
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
ERROR: from element /GstPipeline:pipeline0/GstX264Enc:x264enc0: Can not initialize x264 encoder.
Additional debug info:
gstx264enc.c(1898): gst_x264_enc_init_encoder (): /GstPipeline:pipeline0/GstX264Enc:x264enc0
Execution ended after 0:00:00.001302127
Setting pipeline to NULL ...
Freeing pipeline ...

I have tried this before. It did not help.

Could you share the model files, deepstream configuration files and some test images?
You can send it via private messages. I am going to reproduce.

I have shared the files with you privately. Thanks

Received. Thanks.

@Morganh
Any update? What is the root cause?

I am still working on the deepstream inference. Actually there is no issue in TAO deploy inference and TAO-tf2 inference now. I am also going to generate standalone inference for reference instead.

To run inference with deepstream, please use below steps.
$ xhost +
$ export DISPLAY=:0
$ docker run --runtime=nvidia -it --rm --net=host --gpus all --name ds6.3 -d -v /localhome/local-morganh:/localhome/local-morganh -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=$DISPLAY nvcr.io/nvidia/deepstream:6.3-samples
$ docker exec -it ds6.3 /bin/bash
$ cd /opt/nvidia/deepstream/deepstream-6.3/samples/configs/deepstream-app
$ vim ds_classification_as_primary_gie.txt

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#gie-kitti-output-dir=streamscl

[tiled-display]
enable=1
rows=1
columns=1
#width=1280
#height=720
width=640
height=360
gpu-id=0
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory applicable for Tesla
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory applicable for Tesla
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory applicable for Tesla
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
nvbuf-memory-type=0

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP
type=3
#uri=file:///localhome/local-morganh/TAO_ClassificationTF2_resources/sample_images/staff/out.mp4
uri=file:///localhome/local-morganh/TAO_ClassificationTF2_resources/sample_images/non-staff/out_nonstaff.mp4
num-sources=1
#drop-frame-interval=2
gpu-id=0
# (0): memtype_device   - Memory type Device
# (1): memtype_pinned   - Memory type Host Pinned
# (2): memtype_unified  - Memory type Unified
cudadec-memtype=0

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
#type=2
type=2
sync=0
source-id=0
gpu-id=0
nvbuf-memory-type=0

[sink1]
enable=0
type=3
#1=mp4 2=mkv
container=1
#1=h264 2=h265
codec=1
sync=0
#iframeinterval=10
bitrate=2000000
#bitrate=200
output-file=out.mp4
source-id=0

[sink2]
enable=0
#Type - 1=FakeSink 2=EglSink 3=File 4=RTSPStreaming
type=4
#1=h264 2=h265
codec=1
sync=0
bitrate=4000000
# set below properties in case of RTSPStreaming
rtsp-port=8554
udp-port=5400

[osd]
enable=1
gpu-id=0
border-width=1
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=0
batch-size=1
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000
## Set muxer output width and height
width=1920
height=1080
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0

# config-file property is mandatory for any gie section.
# Other properties are optional and if set will override the properties set in
# the infer config file.
[primary-gie]
enable=1
gpu-id=0
model-engine-file=/opt/nvidia/deepstream/deepstream-6.3/samples/configs/deepstream-app/model.onnx_b1_gpu0_fp32.engine
batch-size=1
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
interval=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_as_primary_gie.txt

[tests]
file-loop=1

$ vim config_as_primary_gie.txt

[property]
gpu-id=0
net-scale-factor=0.0175070028011204
offsets=123.675;116.28;103.53
batch-size= 1
model-color-format=0

# model config
onnx-file=model.onnx
#tlt-model-key=yourkey
#tlt-encoded-model=your_unpruned_or_pruned_model.etlt
labelfile-path=labels.txt
#int8-calib-file=cal.bin
#model-engine-file=your_classification.engine
#input-dims=3;128;128;0
infer-dims=3;128;128
uff-input-blob-name=Input_1
output-blob-names=Identity:0

# process-mode: 2 - inferences on crops from primary detector, 1 - inferences on whole frame
process-mode=1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=0

network-type=1 # defines that the model is a classifier.
num-detected-classes=2
interval=0
gie-unique-id=1
#threshold=0.05
classifier-async-mode=1
classifier-threshold=0.00002
operate-on-gie-id=1
#operate-on-class-ids=-1

scaling-filter=1
scaling-compute-hw=1
maintain-aspect-ratio=1

$ deepstream-app -c ds_classification_as_primary_gie.txt

** WARN: <parse_tracker:1604>: Unknown key 'enable-batch-process' for group [tracker]
libEGL warning: DRI3: Screen seems not DRI3 capable
libEGL warning: DRI2: failed to authenticate
0:00:02.940466597  6060 0x5593757ac830 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1988> [UID = 1]: deserialized trt engine from :/opt/nvidia/deepstream/deepstream-6.3/samples/configs/deepstream-app/model.onnx_b1_gpu0_fp32.engine
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 2
0   INPUT  kFLOAT input_1         3x128x128
1   OUTPUT kFLOAT Identity:0      2

0:00:03.038633010  6060 0x5593757ac830 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2091> [UID = 1]: Use deserialized engine model: /opt/nvidia/deepstream/deepstream-6.3/samples/configs/deepstream-app/model.onnx_b1_gpu0_fp32.engine
0:00:03.041324084  6060 0x5593757ac830 WARN                 nvinfer gstnvinfer.cpp:1047:gst_nvinfer_start:<primary_gie> warning: NvInfer asynchronous mode is applicable for secondaryclassifiers only. Turning off asynchronous mode
0:00:03.041538580  6060 0x5593757ac830 INFO                 nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/opt/nvidia/deepstream/deepstream-6.3/samples/configs/deepstream-app/config_as_primary_gie.txt sucessfully

Runtime commands:
        h: Print this help
        q: Quit

        p: Pause
        r: Resume

NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
      To go back to the tiled display, right-click anywhere on the window.

** INFO: <bus_callback:239>: Pipeline ready

WARNING from primary_gie: NvInfer asynchronous mode is applicable for secondaryclassifiers only. Turning off asynchronous mode
Debug info: gstnvinfer.cpp(1047): gst_nvinfer_start (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie
** INFO: <bus_callback:225>: Pipeline running


**PERF:  FPS 0 (Avg)
**PERF:  305.10 (304.07)
**PERF:  297.63 (298.81)
**PERF:  299.63 (299.16)
**PERF:  300.00 (299.41)
**PERF:  295.01 (298.36)
**PERF:  298.34 (298.36)
**PERF:  300.27 (298.69)
**PERF:  298.44 (298.64)
**PERF:  297.81 (298.54)

Note: Correct previous setting to

net-scale-factor=0.0175070028011204
offsets=123.675;116.28;103.53

Because in deepstream, y = scale_factor * (x- mean) .
In tao-tf2 torch mode, y = (x/255 - torch_mean)/std = (x - 255 * torch_mean) * (1/255/std) .
So, scale_factor = 1/255/std.
The mean = 255 * torch_mean

More, to generate mp4 file, please use below.
$ apt-get install ffmpeg
$ ffmpeg -framerate 2 -pattern_type glob -i ‘*.jpg’ -c:v libx264 -pix_fmt yuv420p -vf “crop=trunc(iw/2)*2:trunc(ih/2)*2” out.mp4

Thank you for the response.

  1. We are running on Jetson Orin Nano and are encountering issue. We used the nvcr.io/nvidia/deepstream-l4t:6.3-samples image BTW.
root@OrinNano:/opt/nvidia/deepstream/deepstream-6.3/samples/configs/deepstream-app# deepstream-app -c /deepstream_slt/ds_classification_as_primary_gie.txt
No protocol specified
No EGL Display
nvbufsurftransform: Could not get EGL display connection
** ERROR: <create_encode_file_bin:387>: Failed to create 'sink_sub_bin_encoder1'
** ERROR: <create_encode_file_bin:506>: create_encode_file_bin failed
** ERROR: <create_sink_bin:831>: create_sink_bin failed
** ERROR: <create_processing_instance:956>: create_processing_instance failed
** ERROR: <create_pipeline:1576>: create_pipeline failed
** ERROR: <main:697>: Failed to create pipeline
Quitting
nvstreammux: Successfully handled EOS for source_id=0
App run failed

Would you mind share the instructions for Jetson Orin Nano.

  1. Would you mind sharing the output video files with us?

Please run $export DISPLAY=:0 . If not working, please try to search in deepstream forum if there is similar error log.

Do you mean test mp4 file?

The output video(s) from DeepStream, i.e. the out.mp4 specified in sink1.

I shared the output videos and test videos to you via private message.
More, in TAO deploy, there is center_crop of preprocessing, see tao_deploy/nvidia_tao_deploy/cv/classification_tf1/dataloader.py at 31c7e0ed3fe48942c254b3b85517e7418eea17b3 · NVIDIA/tao_deploy · GitHub.
but this “center_crop” is not supported in deepstream(similar topic: How to set true center crop for classification model in deepstream pipeline?). I already sync with deepstream team about this feature in deepstream.

So, for your staff video, it is needed to do “center_crop” for test images and then generate test video file.
You can add below to save “center_crop” images.

    def _load_gt_image(self, image_path):
        """Load GT image from file."""
        self.image_path = image_path #morganh
        img = Image.open(image_path)

After tao_deploy/nvidia_tao_deploy/cv/classification_tf1/dataloader.py at 31c7e0ed3fe48942c254b3b85517e7418eea17b3 · NVIDIA/tao_deploy · GitHub,

            image = image.crop(
                (left_corner,
                    top_corner,
                    left_corner + self.width,
                    top_corner + self.height))
            tmp_name = "your_staff_crop_folder/" + str(self.image_path).split("/")[-1]  #morganh
            image.save(tmp_name)  #morganh

Great news! Based on your findings, I managed to find a workaround.

I re-trained a model without cropping, i.e.,

dataset:
  augmentation:
    enable_center_crop: False
    enable_random_crop: False

and use it in DeepStream. The results are as accurate as tao model classification_tf2 evaluate.

Thank you very much for looking into this matter.

1 Like

Hello, I am seeing an accuracy drop with tao deploy evaluation against the tf 1.15.5 backend evaluation, with detectnet_v2 retrained onto the KITTI dataset using the sample notebook and the sample spec files. How can I resolve this?

Can you create a new forum topic? Your issue is for tao-tf1 docker and detectnet_v2.