DeepStream SDK FAQ

1. Reading xxxx code is the best way to get the answer
2. The user of 4.0 needs C/C++ background
3. More resource:
https://docs.nvidia.com/metropolis/

nvinfer / streamMux / DeMux

  1. Source code diagram

  2. How to get original NV12 frame buffer
    https://devtalk.nvidia.com/default/topic/1060956/deepstream-sdk/access-frame-pointer-in-deepstream-app/post/5375214/#5375214

  3. How to get detection confidence
    https://devtalk.nvidia.com/default/topic/1060849/deepstream-sdk/deepstream-v4-zero-confidence-problem/?offset=2#5372609
    https://devtalk.nvidia.com/default/topic/1058661/deepstream-sdk/nvinfer-is-not-populating-confidence-field-in-nvdsobjectmeta-ds-4-0-/post/5373361/#5373361

  4. nvinfer config “model-color-format” is defined in nvdsinfer_context.h and parsed in gstnvinfer_property_parser.cpp
    nvinfer supports not only bgr/rgb, but also gray and other formats.

/**
 * Enum for color formats.
 */
typedef enum
{
    /** 24-bit interleaved R-G-B */
    NvDsInferFormat_RGB,
    /** 24-bit interleaved B-G-R */
    NvDsInferFormat_BGR,
    /** 8-bit Luma */
    NvDsInferFormat_GRAY,
    /** 32-bit interleaved R-G-B-A */
    NvDsInferFormat_RGBA,
    /** 32-bit interleaved B-G-R-x */
    NvDsInferFormat_BGRx,
    NvDsInferFormat_Unknown = 0xFFFFFFFF,
} NvDsInferFormat;
  1. How to support each stream to deploy different aglorithm
diff --git a/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp b/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp
old mode 100644
new mode 100755
index c6867c87..cc70840c
--- a/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp
+++ b/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp
@@ -601,8 +601,10 @@ gst_nvinfer_sink_event (GstBaseTransform * trans, GstEvent * event)
     /* New source added in the pipeline. Create a source info instance for it. */
     guint source_id;
     gst_nvevent_parse_pad_added (event, &source_id);
-    nvinfer->source_info->emplace (source_id, GstNvInferSourceInfo ());
-  }
+    if (!nvinfer->process_full_frame && /* source_id is what your want for this SGIE */) {
+        nvinfer->source_info->emplace (source_id, GstNvInferSourceInfo ());
+      }
+    }
 
   if ((GstNvEventType) GST_EVENT_TYPE (event) == GST_NVEVENT_PAD_DELETED) {
     /* Source removed from the pipeline. Remove the related structure. */
@@ -1409,6 +1411,8 @@ gst_nvinfer_process_objects (GstNvInfer * nvinfer, GstBuffer * inbuf,
 
     /* Find the source info instance. */
     auto iter = nvinfer->source_info->find (frame_meta->pad_index);
+
+    /* If the source_id is not found, the object will be ignored */
     if (iter == nvinfer->source_info->end ()) {
       GST_WARNING_OBJECT
           (nvinfer,
  1. How to get/update source_id
    https://devtalk.nvidia.com/default/topic/1062520/deepstream-sdk/getting-source-stream-id-from-nvosd-plugin/post/5380861/#5380861
    https://forums.developer.nvidia.com/t/how-to-set-source-id-in-streammux/120797

  2. FP16 model issue
    If the weights in the model is outside of fp16 range, there will be uff parser issue as the below print:

NvDsInferContext[UID 1]:initialize(): Trying to create engine from model files
ERROR nvinfer gstnvinfer.cpp:511:gst_nvinfer_logger: NvDsInferContext[UID 1]:log(): UffParser: Parser error: bn_conv1/moving_variance: Weight 110542.968750 at index 8 is outside of [-65504.000000, 65504.000000]. Please try running the parser in a higher precision mode and setting the builder to fp16 mode instead.
NvDsInferCudaEngineGetFromTltModel: Failed to parse UFF model

In order to fix this issue, we can apply this patch to the nvinfer source code and build a new libnvds_infer.so to replace

--- a/src/utils/nvdsinfer/nvdsinfer_context_impl.cpp
+++ b/src/utils/nvdsinfer/nvdsinfer_context_impl.cpp
@@ -1851,7 +1851,7 @@ NvDsInferContextImpl::generateTRTModel(
         }
 
         if (!uffParser->parse(initParams.uffFilePath,
-                    *network, modelDataType))
+                    *network,DataType::kFLOAT))

8.Here’s a simple example CUDA kernel of cropping image: https://github.com/dusty-nv/jetson-video/blob/master/cuda/cudaCrop.cu

  1. How to deploy mrcnn model (https://github.com/matterport/Mask_RCNN) with resnet50 backbone and classNum change in h5 model ?
    https://devtalk.nvidia.com/default/topic/1031938/deepstream-sdk/converting-mask-rcnn-to-tensor-rt/post/5416100/#5416100

  2. How to disable object detecdtion for different sources?
    https://devtalk.nvidia.com/default/topic/1068016/deepstream-sdk/can-deepstream-select-to-enable-or-disable-object-detection-for-different-sources/post/5410165/#5410165

metadata / msgconv / msgbroker / Codec/ ds-app
1. Sample of adding metadata
https://devtalk.nvidia.com/default/topic/1061083/deepstream-sdk/attaching-custom-type-metadata-to-gstreamer-buffer-on-src-pad-causing-sudden-crash/post/5374690/#5374690

2. Sample of customizing gst-dsexample:
https://devtalk.nvidia.com/default/topic/1061422/deepstream-sdk/how-to-crop-the-image-and-save/post/5375174/#5375174

3. Sample config file of running single RTSP source:
https://devtalk.nvidia.com/default/topic/1058086/deepstream-sdk/how-to-run-rtp-camera-in-deepstream-on-nano/post/5366807/#5366807

5. Sample of accessing NvBufSurface
https://devtalk.nvidia.com/default/topic/1061205/deepstream-sdk/rtsp-camera-access-frame-issue/post/5377678/#5377678

6.Use GST_PAD_PROBE_DROP macro to drop the buffer in the attached probe.
Refer to https://gstreamer.freedesktop.org/documentation/application-development/advanced/pipeline-manipulation.html?gi-language=c for the example

static GstPadProbeReturn
event_probe_cb (GstPad * pad, GstPadProbeInfo * info, gpointer user_data)
{
    return GST_PAD_PROBE_DROP;
}

7.Add dsexample in ds-test1 app
https://devtalk.nvidia.com/default/topic/1065406/deepstream-sdk/enable-dsexample-in-test-app/?offset=3#5398407

8. Optical flow
Optical flow functionality is supported only on Jetson AGX Xavier and Turing GPUs T4 / RTX 2080 etc. It won’t work on Jetson Nano and GTX

9. How can we set “drop-frame-interval” more than 30 ?
a. Find and download “L4t source” from https://developer.nvidia.com/embedded/downloads#?search=source
gst-nvvideo4linux2_src.tbz2 is in public_sources.tbz2
b. Apply the patches “0001-gstv4l2dec-Fix-high-CPU-usage-in-drop-frame.patch” and " 0002-gst-v4l2dec-Increase-Drop-Frame-Interval.patch"
c. Build a new libgstnvvideo4linux2.so and replace /usr/lib/$(ARCH)/gstreamer-1.0/libgstnvvideo4linux2.so

0001-gstv4l2dec-Fix-high-CPU-usage-in-drop-frame.patch

From 5d8d5a0977473eae89c0f310171d2c7060e24eb6 Mon Sep 17 00:00:00 2001
From: vpagar <vpagar@nvidia.com>
Date: Thu, 5 Dec 2019 16:04:02 +0530
Subject: [PATCH 1/2] gstv4l2dec: Fix high CPU usage in drop-frame

In case of drop-frame-interval, in LL v4l2 implementation a
thread in low level v4l2 lib which sends buffer to block and
a callback thread spins between themselves causing high CPU percentage
usage over the perid.
This CL drops frame at the gstreamer level and LL v4l2 does not handle
dropping frames.

Unit-Test:
gst-launch-1.0 multifilesrc location= sample_720p.h264 \
! h264parse ! nvv4l2decoder drop-frame-interval=3 ! fakesink
and check CPU percentage usage in htop, it should stay stable.

Bug 200562189

Change-Id: I9af22745501d6a9892c341cb640dac16f8641763
---
 gst-v4l2/gstv4l2videodec.c | 23 ++++++++++++++++++++++-
 gst-v4l2/gstv4l2videodec.h |  1 +
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/gst-v4l2/gstv4l2videodec.c b/gst-v4l2/gstv4l2videodec.c
index 5531f9d..f8c62f2 100644
--- a/gst-v4l2/gstv4l2videodec.c
+++ b/gst-v4l2/gstv4l2videodec.c
@@ -593,6 +593,9 @@ gst_v4l2_video_dec_start (GstVideoDecoder * decoder)
   gst_v4l2_object_unlock (self->v4l2output);
   g_atomic_int_set (&self->active, TRUE);
   self->output_flow = GST_FLOW_OK;
+#if USE_V4L2_TARGET_NV
+  self->decoded_picture_cnt = 0;
+#endif
 
   return TRUE;
 }
@@ -704,6 +707,11 @@ gst_v4l2_video_dec_set_format (GstVideoDecoder * decoder,
     }
   }
 
+#if 0
+  /* *
+   * TODO: From low level library remove support of drop frame interval after
+   * analyzing high CPU utilization in initial implementation.
+   * */
   if (self->drop_frame_interval != 0) {
     if (!set_v4l2_video_mpeg_class (self->v4l2output,
         V4L2_CID_MPEG_VIDEODEC_DROP_FRAME_INTERVAL,
@@ -712,6 +720,7 @@ gst_v4l2_video_dec_set_format (GstVideoDecoder * decoder,
       return FALSE;
     }
   }
+#endif
 #ifndef USE_V4L2_TARGET_NV_CODECSDK
   if (self->disable_dpb != DEFAULT_DISABLE_DPB) {
     if (!set_v4l2_video_mpeg_class (self->v4l2output,
@@ -1141,10 +1150,21 @@ gst_v4l2_video_dec_loop (GstVideoDecoder * decoder)
       gst_caps_unref(reference);
     }
 
-    ret = gst_video_decoder_finish_frame (decoder, frame);
+#if USE_V4L2_TARGET_NV
+    if ((self->drop_frame_interval == 0) ||
+        (self->decoded_picture_cnt % self->drop_frame_interval == 0))
+        ret = gst_video_decoder_finish_frame (decoder, frame);
+    else
+        ret = gst_video_decoder_drop_frame (GST_VIDEO_DECODER (self), frame);
 
     if (ret != GST_FLOW_OK)
       goto beach;
+
+    self->decoded_picture_cnt += 1;
+#else
+    ret = gst_video_decoder_finish_frame (decoder, frame);
+#endif
+
   } else {
     GST_WARNING_OBJECT (decoder, "Decoder is producing too many buffers");
     gst_buffer_unref (buffer);
@@ -1696,6 +1716,7 @@ gst_v4l2_video_dec_init (GstV4l2VideoDec * self)
   self->skip_frames = DEFAULT_SKIP_FRAME_TYPE;
   self->nvbuf_api_version_new = DEFAULT_NVBUF_API_VERSION_NEW;
   self->drop_frame_interval = 0;
+  self->decoded_picture_cnt = 0;
   self->num_extra_surfaces = DEFAULT_NUM_EXTRA_SURFACES;
 #ifndef USE_V4L2_TARGET_NV_CODECSDK
   self->disable_dpb = DEFAULT_DISABLE_DPB;
diff --git a/gst-v4l2/gstv4l2videodec.h b/gst-v4l2/gstv4l2videodec.h
index 50d07c5..5015c30 100644
--- a/gst-v4l2/gstv4l2videodec.h
+++ b/gst-v4l2/gstv4l2videodec.h
@@ -71,6 +71,7 @@ struct _GstV4l2VideoDec
   GstFlowReturn output_flow;
   guint64 frame_num;
 #ifdef USE_V4L2_TARGET_NV
+  guint64 decoded_picture_cnt;
   guint32 skip_frames;
   guint32 drop_frame_interval;
   gboolean nvbuf_api_version_new;
-- 
2.17.1

0002-gst-v4l2dec-Increase-Drop-Frame-Interval.patch

From 52665605036144ac20628c95e52fdd82edae71b9 Mon Sep 17 00:00:00 2001
From: vpagar <vpagar@nvidia.com>
Date: Wed, 11 Dec 2019 11:56:25 +0530
Subject: [PATCH 2/2] gst-v4l2dec: Increase Drop Frame Interval

Bug 200575866

Change-Id: If5576683c0fad95832595838d032d3145b88ea36
---
 gst-v4l2/gstv4l2videodec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gst-v4l2/gstv4l2videodec.c b/gst-v4l2/gstv4l2videodec.c
index f8c62f2..00d7740 100644
--- a/gst-v4l2/gstv4l2videodec.c
+++ b/gst-v4l2/gstv4l2videodec.c
@@ -1807,7 +1807,7 @@ gst_v4l2_video_dec_class_init (GstV4l2VideoDecClass * klass)
           "Drop frames interval",
           "Interval to drop the frames,ex: value of 5 means every 5th frame will be given by decoder, rest all dropped",
           0,
-          30, 30,
+          G_MAXUINT, G_MAXUINT,
           G_PARAM_READWRITE | G_PARAM_STATIC_STRINGS | GST_PARAM_MUTABLE_READY));
 
   g_object_class_install_property (gobject_class, PROP_NUM_EXTRA_SURFACES,
-- 
2.17.1

10.use deepstream-app option
refer https://devtalk.nvidia.com/default/topic/1069070/deepstream-sdk/labels-disappear-with-multiple-sources/post/5415523/#5415523

Thank you very much ChrisDing. This is really helpful.

nvtracker standalone user sample

https://devtalk.nvidia.com/default/topic/1066252/deepstream-sdk/klt-nvmot-usage/

get nvtracker history

https://devtalk.nvidia.com/default/topic/1061798/deepstream-sdk/how-to-obtain-previous-states-of-tracked-object-/

Fix for a memory accumulation bug in GstBaseParse
A memory accumulation bug was found in GStreamer’s Base Parse class which potentially affects all codec parsers provided by GStreamer. This bug is seen only with long duration seekable streams (mostly containerized files e.g. mp4). This does not affect live sources like RTSP. We have filed an issue on GStreamer’s gitlab project (https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/468).

Temporary fix

  1. Check the exact gstreamer version installed on the system.

$ gst-inspect-1.0 --version

gst-inspect-1.0 version 1.14.5

GStreamer 1.14.5

https://launchpad.net/distros/ubuntu/+source/gstreamer1.0

  1. Clone the Gstreamer repo and checkout the tag corresponding to the installed version

$ git clone git@gitlab.freedesktop.org:gstreamer/gstreamer.git

$ cd gstreamer

$ git checkout 1.14.5

  1. Make sure build dependencies are installed

$ sudo apt install libbison-dev build-essential flex debhelper

  1. Run autogen.sh and configure script

$ ./autogen.sh –noconfigure

./configure –prefix=(pwd)/out # Don’t want to overwrite system libs

  1. Save the following patch to a file
diff --git a/libs/gst/base/gstbaseparse.c b/libs/gst/base/gstbaseparse.c
index 41adf130e..ffc662a45 100644
--- a/libs/gst/base/gstbaseparse.c
+++ b/libs/gst/base/gstbaseparse.c
@@ -1906,6 +1906,9 @@ gst_base_parse_add_index_entry (GstBaseParse * parse, guint64 offset,
   GST_LOG_OBJECT (parse, "Adding key=%d index entry %" GST_TIME_FORMAT
       " @ offset 0x%08" G_GINT64_MODIFIER "x", key, GST_TIME_ARGS (ts), offset);
 
+  if (!key)
+    goto exit;
+
   if (G_LIKELY (!force)) {
 
     if (!parse->priv->upstream_seekable) {
  1. Apply the patch

$ cat patch.txt | patch -p1

  1. Build the sources

make -j(nproc) && make install

  1. Backup the distribution provided library and copy the newly built library. Adjust the library name for version. For jetson replace x86_64-linux-gnu with aarch64-linux-gnu

sudo cp /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0.1405.0 {HOME}/libgstbase-1.0.so.0.1405.0.backup

$ sudo cp out/lib/libgstbase-1.0.so.0.1405.0 /usr/lib/x86_64-linux-gnu/

[DS5.0 xx_All_App] For DS 5.0 DP: how to integrate nvdsanalytics plugin in C deepstream-app

  1. User need to create analytics bin in /opt/nvidia/deepstream/deepstream-5.0/sources/apps/apps-common/src
  2. Refer deepstream_dsexample.c and similarly create deepstream_nvdsanalytics.c
  3. deepstream_app.h should be modified to add the instance of nvdsanalytics bin and config in the structures
  4. deepstream_config_file_parser.c needs to updated for parsing of nvdsanalytics config from configuration file
  5. deepstream_app.c should be updated for adding the nvdsanalytics bin in the pipeline, ideally location is after the tracker
  6. Create a new cpp file with process_meta function declared with extern “C”, this will parse the meta for nvdsanalytics, refer sample nvdanalytics test app probe call for creation of the function
  7. Add the probe in deepstream_app_main.c after nvdsanalytics bin
  8. Modify Makefile to compile the cpp and deepstream_app_main.c using g++ with -fpermisive flag and link deepstream-app using g++

These are rough steps, but additional modifications in header files required

For DS 5.0 GA we would be adding the support for meta access

DeepStream 5.0 Manual for YoloV4

  • The original Yolo implementation via CUDA kernel in DeepStream is based on old Yolo models (v2, v3) so it may not suit new Yolo models like YoloV4. Location: /opt/nvidia/deepstream/deepstream-5.0/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/kernels.cu

  • We are trying to embed Yolo layer into tensorRT engine while converting darknet or pytorch into engine, this is before deploying to DeepStream. This new solution would cause the old Yolo cuda kernel in DeepStream no longer to be used.

You can try following steps to make DeepStream working for YoloV4:

  1. go to https://github.com/Tianxiaomo/pytorch-YOLOv4 to generate a TensorRT engine according to this workflow: DarkNet or Pytorch --> ONNX --> TensorRT.
  2. Add following C++ functions into objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp and rebuild libnvdsinfer_custom_impl_Yolo.so
  3. Here are configuration files for you as references (You have to update a little to suit your environment):
    config_infer_primary_yoloV4.txt (3.4 KB)
    deepstream_app_config_yoloV4.txt (3.8 KB)
static NvDsInferParseObjectInfo convertBBoxYoloV4(const float& bx1, const float& by1, const float& bx2,
                                     const float& by2, const uint& netW, const uint& netH)
{
    NvDsInferParseObjectInfo b;
    // Restore coordinates to network input resolution

    float x1 = bx1 * netW;
    float y1 = by1 * netH;
    float x2 = bx2 * netW;
    float y2 = by2 * netH;

    x1 = clamp(x1, 0, netW);
    y1 = clamp(y1, 0, netH);
    x2 = clamp(x2, 0, netW);
    y2 = clamp(y2, 0, netH);

    b.left = x1;
    b.width = clamp(x2 - x1, 0, netW);
    b.top = y1;
    b.height = clamp(y2 - y1, 0, netH);

    return b;
}

static void addBBoxProposalYoloV4(const float bx, const float by, const float bw, const float bh,
                     const uint& netW, const uint& netH, const int maxIndex,
                     const float maxProb, std::vector<NvDsInferParseObjectInfo>& binfo)
{
    NvDsInferParseObjectInfo bbi = convertBBoxYoloV4(bx, by, bw, bh, netW, netH);
    if (bbi.width < 1 || bbi.height < 1) return;

    bbi.detectionConfidence = maxProb;
    bbi.classId = maxIndex;
    binfo.push_back(bbi);
}

static std::vector<NvDsInferParseObjectInfo>
decodeYoloV4Tensor(
    const float* boxes, const float* scores,
    const uint num_bboxes, NvDsInferParseDetectionParams const& detectionParams,
    const uint& netW, const uint& netH)
{
    std::vector<NvDsInferParseObjectInfo> binfo;

    uint bbox_location = 0;
    uint score_location = 0;
    for (uint b = 0; b < num_bboxes; ++b)
    {
        float bx1 = boxes[bbox_location];
        float by1 = boxes[bbox_location + 1];
        float bx2 = boxes[bbox_location + 2];
        float by2 = boxes[bbox_location + 3];

        float maxProb = 0.0f;
        int maxIndex = -1;

        for (uint c = 0; c < detectionParams.numClassesConfigured; ++c)
        {
            float prob = scores[score_location + c];
            if (prob > maxProb)
            {
                maxProb = prob;
                maxIndex = c;
            }
        }

        if (maxProb > detectionParams.perClassPreclusterThreshold[maxIndex])
        {
            addBBoxProposalYoloV4(bx1, by1, bx2, by2, netW, netH, maxIndex, maxProb, binfo);
        }

        bbox_location += 4;
        score_location += detectionParams.numClassesConfigured;
    }

    return binfo;
}

extern "C" bool NvDsInferParseCustomYoloV4(
    std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
    NvDsInferNetworkInfo const& networkInfo,
    NvDsInferParseDetectionParams const& detectionParams,
    std::vector<NvDsInferParseObjectInfo>& objectList)
{
    if (NUM_CLASSES_YOLO != detectionParams.numClassesConfigured)
    {
        std::cerr << "WARNING: Num classes mismatch. Configured:"
                  << detectionParams.numClassesConfigured
                  << ", detected by network: " << NUM_CLASSES_YOLO << std::endl;
    }

    std::vector<NvDsInferParseObjectInfo> objects;

    const NvDsInferLayerInfo &boxes = outputLayersInfo[0]; // num_boxes x 4
    const NvDsInferLayerInfo &scores = outputLayersInfo[1]; // num_boxes x num_classes

    // 3 dimensional: [num_boxes, 1, 4]
    assert(boxes.inferDims.numDims == 3);
    // 2 dimensional: [num_boxes, num_classes]
    assert(scores.inferDims.numDims == 2);

    // The second dimension should be num_classes
    assert(detectionParams.numClassesConfigured == scores.inferDims.d[1]);
    
    uint num_bboxes = boxes.inferDims.d[0];

    // std::cout << "Network Info: " << networkInfo.height << "  " << networkInfo.width << std::endl;

    std::vector<NvDsInferParseObjectInfo> outObjs =
        decodeYoloV4Tensor(
            (const float*)(boxes.buffer), (const float*)(scores.buffer), num_bboxes, detectionParams,
            networkInfo.width, networkInfo.height);

    objects.insert(objects.end(), outObjs.begin(), outObjs.end());

    objectList = objects;

    return true;
}

1. [DS5.0GA_Jetson_GPU_Plugin] Measure of the FPS of pipeline

2. [DS5.0GA_Jetson_GPU_Plugin] Dump the Inference Input

3. [DS5.0GA_Jetson_App] Rotate camera input image with NvBufSurfTransform() API