"scratch TensorRT API network + non-supported layer plugin" is not working in deepstream sdk 5.0

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
Jetson Xavier Nx
• DeepStream Version
• JetPack Version (valid for Jetson only)
Output of “cat /etc/nv_tegra_release” command as:

R32 (release), REVISION: 4.2, GCID: 20074772, BOARD: t186ref, EABI: aarch64, DATE: Thu Apr 9 01:26:40 UTC 2020

• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)

I have engine file (e.g. yolov4.engine) based on “TensorRT API to generate the network from scratch” and add all non-supported layers as a plugin (in shared library form e.g. libcustomlayerplugin.so). I’ve used following command to verify general working

/usr/src/tensorrt/bin/trtexec --loadEngine=./yolov4.engine --plugins=./liblayerplugin.so

&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=./yolov4.engine --plugins=./liblayerplugin.so
[08/21/2020-16:18:32] [I] === Model Options ===
[08/21/2020-16:18:32] [I] Format: *
[08/21/2020-16:18:32] [I] Model:
[08/21/2020-16:18:32] [I] Output:
[08/21/2020-16:18:32] [I] === Build Options ===
[08/21/2020-16:18:32] [I] Max batch: 1
[08/21/2020-16:18:32] [I] Workspace: 16 MB
[08/21/2020-16:18:32] [I] minTiming: 1
[08/21/2020-16:18:32] [I] avgTiming: 8
[08/21/2020-16:18:32] [I] Precision: FP32

after some other messages I get below

08/21/2020-16:18:41] [I] GPU Compute
[08/21/2020-16:18:41] [I] min: 68.1166 ms
[08/21/2020-16:18:41] [I] max: 85.4661 ms
[08/21/2020-16:18:41] [I] mean: 73.632 ms
[08/21/2020-16:18:41] [I] median: 74.3506 ms
[08/21/2020-16:18:41] [I] percentile: 85.4661 ms at 99%
[08/21/2020-16:18:41] [I] total compute time: 2.57712 s
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=./yolov4.engine --plugins=./liblayerplugin.so

For integration in SDK I have set the following fields in config_infer_primary.txt (adapted from config_infer_primary_yoloV3.txt):

This is what goes in the file:

so that custom plugin (libcustomlayerplugin.so) can be registered with TensorRT runtime via dlopen of Gst-nvinfer.

The other libraries are provided on the commandline itself as

LD_PRELOAD=“libnvds_infercustomparser_yolov3_tlt.so libnvinfer_plugin.so.7.1.3” ./deepstream-app -c deepstream_app_source1_detection_models.txt

With that arrangement, I get errors below. What is the best way to integrate “scratch TensorRT API network + non-supported layer plugin” with deepstream. It sounds like custom-lib-path can take only one file but multiple shared libraries needs to be registered with TensorRT via Gst-nvinfer.


I get below errors:

Unknown or legacy key specified ‘is-classifier’ for group [property]

Using winsys: x11
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_mot_klt.so
gstnvtracker: Optional NvMOT_RemoveStreams not implemented
gstnvtracker: Batch processing is OFF
gstnvtracker: Past frame output is OFF
0:00:05.998213168 25695 0x55bd52e920 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1701> [UID = 1]: deserialized trt engine from :/home/mgsaeed/nvme/iva-20.08/samples/models/tlt_pretrained_models/yolov4/yolov4.engine
INFO: [Implicit Engine Info]: layers num: 2
0 INPUT kFLOAT data 3x608x608
1 OUTPUT kFLOAT prob 7001x1x1

0:00:05.998445747 25695 0x55bd52e920 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1805> [UID = 1]: Use deserialized engine model: /home/mgsaeed/nvme/iva-20.08/samples/models/tlt_pretrained_models/yolov4/yolov4.engine
0:00:06.177443429 25695 0x55bd52e920 ERROR nvinfer gstnvinfer.cpp:613:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::initResource() <nvdsinfer_context_impl.cpp:683> [UID = 1]: Detect-postprocessor failed to init resource because dlsym failed to get func NvDsInferParseCustomYOLOV3TLT pointer
ERROR: Infer Context failed to initialize post-processing resource, nvinfer error:NVDSINFER_CUSTOM_LIB_FAILED
ERROR: Infer Context prepare postprocessing resource failed., nvinfer error:NVDSINFER_CUSTOM_LIB_FAILED
0:00:06.246481893 25695 0x55bd52e920 WARN nvinfer gstnvinfer.cpp:809:gst_nvinfer_start:<primary_gie> error: Failed to create NvDsInferContext instance
0:00:06.246593734 25695 0x55bd52e920 WARN nvinfer gstnvinfer.cpp:809:gst_nvinfer_start:<primary_gie> error: Config file path: /home/mgsaeed/nvme/iva-20.08/samples/configs/tlt_pretrained_models/config_infer_primary_yolov4.txt, NvDsInfer Error: NVDSINFER_CUSTOM_LIB_FAILED
** ERROR: main:655: Failed to set pipeline to PAUSED
ERROR from primary_gie: Failed to create NvDsInferContext instance
Debug info: /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(809): gst_nvinfer_start (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie:
Config file path: /home/mgsaeed/nvme/iva-20.08/samples/configs/tlt_pretrained_models/config_infer_primary_yolov4.txt, NvDsInfer Error: NVDSINFER_CUSTOM_LIB_FAILED
App run failed

For yolov4, you should follow the steps given here:
https://forums.developer.nvidia.com/t/deepstream-sdk-faq/80236/8. Convert your yolov4 to onnx and then use the onnx file to generate the trt model. Also update the cpp files as mentioned in the given steps. I’ve tested it to work well.

Many thanks for the link. I have managed to be able to get it working and was able to run deepstream_app demo.

Few observations and questions:

Q1: Yolo4 is supposed to perform better than Yolo3 however I am getting ~35 frames per second (Yolo4) and ~40 frames per second (Yolo3) as reported by deepstream-app. Configurations in following files are same (with the exception of necessary differences for v3 and v4)


Why Yolo4 is performing slow on Jetson Xavier NX as compared to Yolo3?

Q2: Are these frames per second performance numbers (reported from deepstream-app) for inference only (i.e. feed the network input and get the output from the network)? means no pre/post-processing?

Q3: There is yolov3-calibration.table.trt7.0 file for Int8 processing. How do I get that for yolov4 configuration?

Thank you.


YOLOv4 outperforms YOLOv3 in precision and callback.

You can refer to YOLOv3 and YOLOv4 papers here: https://arxiv.org/abs/1804.02767 and https://arxiv.org/abs/2004.10934


FPS of YOLOv4 on DeepStream includes preprocessing and postprocessing

We have not provided support of int8 fully. You have to collect your own sample images and do calibration by yourself.
Here is document of how you can calibrate:
C++: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#optimizing_int8_c
Python: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#optimizing_int8_python