DeepStream SDK FAQ

Fix for a memory accumulation bug in GstBaseParse
A memory accumulation bug was found in GStreamer’s Base Parse class which potentially affects all codec parsers provided by GStreamer. This bug is seen only with long duration seekable streams (mostly containerized files e.g. mp4). This does not affect live sources like RTSP. We have filed an issue on GStreamer’s gitlab project (gstbaseparse: High memory usage in association index for long duration files (#468) · Issues · GStreamer / gstreamer · GitLab).

Temporary fix

  1. Check the exact gstreamer version installed on the system.

$ gst-inspect-1.0 --version

gst-inspect-1.0 version 1.14.5

GStreamer 1.14.5

https://launchpad.net/distros/ubuntu/+source/gstreamer1.0

  1. Clone the Gstreamer repo and checkout the tag corresponding to the installed version

$ git clone git@gitlab.freedesktop.org:gstreamer/gstreamer.git

$ cd gstreamer

$ git checkout 1.14.5

  1. Make sure build dependencies are installed

$ sudo apt install libbison-dev build-essential flex debhelper

  1. Run autogen.sh and configure script

$ ./autogen.sh –noconfigure

$ ./configure –prefix=$(pwd)/out # Don’t want to overwrite system libs

  1. Save the following patch to a file
diff --git a/libs/gst/base/gstbaseparse.c b/libs/gst/base/gstbaseparse.c
index 41adf130e..ffc662a45 100644
--- a/libs/gst/base/gstbaseparse.c
+++ b/libs/gst/base/gstbaseparse.c
@@ -1906,6 +1906,9 @@ gst_base_parse_add_index_entry (GstBaseParse * parse, guint64 offset,
   GST_LOG_OBJECT (parse, "Adding key=%d index entry %" GST_TIME_FORMAT
       " @ offset 0x%08" G_GINT64_MODIFIER "x", key, GST_TIME_ARGS (ts), offset);
 
+  if (!key)
+    goto exit;
+
   if (G_LIKELY (!force)) {
 
     if (!parse->priv->upstream_seekable) {
  1. Apply the patch

$ cat patch.txt | patch -p1

  1. Build the sources

$ make -j$(nproc) && make install

  1. Backup the distribution provided library and copy the newly built library. Adjust the library name for version. For jetson replace x86_64-linux-gnu with aarch64-linux-gnu

$ sudo cp /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0.1405.0 ${HOME}/libgstbase-1.0.so.0.1405.0.backup

$ sudo cp out/lib/libgstbase-1.0.so.0.1405.0 /usr/lib/x86_64-linux-gnu/

[DS5.0 xx_All_App] For DS 5.0 DP: how to integrate nvdsanalytics plugin in C deepstream-app

  1. User need to create analytics bin in /opt/nvidia/deepstream/deepstream-5.0/sources/apps/apps-common/src
  2. Refer deepstream_dsexample.c and similarly create deepstream_nvdsanalytics.c
  3. deepstream_app.h should be modified to add the instance of nvdsanalytics bin and config in the structures
  4. deepstream_config_file_parser.c needs to updated for parsing of nvdsanalytics config from configuration file
  5. deepstream_app.c should be updated for adding the nvdsanalytics bin in the pipeline, ideally location is after the tracker
  6. Create a new cpp file with process_meta function declared with extern “C”, this will parse the meta for nvdsanalytics, refer sample nvdanalytics test app probe call for creation of the function
  7. Add the probe in deepstream_app_main.c after nvdsanalytics bin
  8. Modify Makefile to compile the cpp and deepstream_app_main.c using g++ with -fpermisive flag and link deepstream-app using g++

These are rough steps, but additional modifications in header files required

For DS 5.0 GA we would be adding the support for meta access

DeepStream 5.0 Manual for YoloV4

  • The original Yolo implementation via CUDA kernel in DeepStream is based on old Yolo models (v2, v3) so it may not suit new Yolo models like YoloV4. Location: /opt/nvidia/deepstream/deepstream-5.0/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/kernels.cu

  • We are trying to embed Yolo layer into tensorRT engine while converting darknet or pytorch into engine, this is before deploying to DeepStream. This new solution would cause the old Yolo cuda kernel in DeepStream no longer to be used.

You can try following steps to make DeepStream working for YoloV4:

  1. go to https://github.com/Tianxiaomo/pytorch-YOLOv4 to generate a TensorRT engine according to this workflow: DarkNet or Pytorch → ONNX → TensorRT.
  2. Add following C++ functions into objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp and rebuild libnvdsinfer_custom_impl_Yolo.so
  3. Here are configuration files for you as references (You have to update a little to suit your environment):
    config_infer_primary_yoloV4.txt (3.4 KB)
    deepstream_app_config_yoloV4.txt (3.8 KB)
static NvDsInferParseObjectInfo convertBBoxYoloV4(const float& bx1, const float& by1, const float& bx2,
                                     const float& by2, const uint& netW, const uint& netH)
{
    NvDsInferParseObjectInfo b;
    // Restore coordinates to network input resolution

    float x1 = bx1 * netW;
    float y1 = by1 * netH;
    float x2 = bx2 * netW;
    float y2 = by2 * netH;

    x1 = clamp(x1, 0, netW);
    y1 = clamp(y1, 0, netH);
    x2 = clamp(x2, 0, netW);
    y2 = clamp(y2, 0, netH);

    b.left = x1;
    b.width = clamp(x2 - x1, 0, netW);
    b.top = y1;
    b.height = clamp(y2 - y1, 0, netH);

    return b;
}

static void addBBoxProposalYoloV4(const float bx, const float by, const float bw, const float bh,
                     const uint& netW, const uint& netH, const int maxIndex,
                     const float maxProb, std::vector<NvDsInferParseObjectInfo>& binfo)
{
    NvDsInferParseObjectInfo bbi = convertBBoxYoloV4(bx, by, bw, bh, netW, netH);
    if (bbi.width < 1 || bbi.height < 1) return;

    bbi.detectionConfidence = maxProb;
    bbi.classId = maxIndex;
    binfo.push_back(bbi);
}

static std::vector<NvDsInferParseObjectInfo>
decodeYoloV4Tensor(
    const float* boxes, const float* scores,
    const uint num_bboxes, NvDsInferParseDetectionParams const& detectionParams,
    const uint& netW, const uint& netH)
{
    std::vector<NvDsInferParseObjectInfo> binfo;

    uint bbox_location = 0;
    uint score_location = 0;
    for (uint b = 0; b < num_bboxes; ++b)
    {
        float bx1 = boxes[bbox_location];
        float by1 = boxes[bbox_location + 1];
        float bx2 = boxes[bbox_location + 2];
        float by2 = boxes[bbox_location + 3];

        float maxProb = 0.0f;
        int maxIndex = -1;

        for (uint c = 0; c < detectionParams.numClassesConfigured; ++c)
        {
            float prob = scores[score_location + c];
            if (prob > maxProb)
            {
                maxProb = prob;
                maxIndex = c;
            }
        }

        if (maxProb > detectionParams.perClassPreclusterThreshold[maxIndex])
        {
            addBBoxProposalYoloV4(bx1, by1, bx2, by2, netW, netH, maxIndex, maxProb, binfo);
        }

        bbox_location += 4;
        score_location += detectionParams.numClassesConfigured;
    }

    return binfo;
}

extern "C" bool NvDsInferParseCustomYoloV4(
    std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
    NvDsInferNetworkInfo const& networkInfo,
    NvDsInferParseDetectionParams const& detectionParams,
    std::vector<NvDsInferParseObjectInfo>& objectList)
{
    if (NUM_CLASSES_YOLO != detectionParams.numClassesConfigured)
    {
        std::cerr << "WARNING: Num classes mismatch. Configured:"
                  << detectionParams.numClassesConfigured
                  << ", detected by network: " << NUM_CLASSES_YOLO << std::endl;
    }

    std::vector<NvDsInferParseObjectInfo> objects;

    const NvDsInferLayerInfo &boxes = outputLayersInfo[0]; // num_boxes x 4
    const NvDsInferLayerInfo &scores = outputLayersInfo[1]; // num_boxes x num_classes

    // 3 dimensional: [num_boxes, 1, 4]
    assert(boxes.inferDims.numDims == 3);
    // 2 dimensional: [num_boxes, num_classes]
    assert(scores.inferDims.numDims == 2);

    // The second dimension should be num_classes
    assert(detectionParams.numClassesConfigured == scores.inferDims.d[1]);
    
    uint num_bboxes = boxes.inferDims.d[0];

    // std::cout << "Network Info: " << networkInfo.height << "  " << networkInfo.width << std::endl;

    std::vector<NvDsInferParseObjectInfo> outObjs =
        decodeYoloV4Tensor(
            (const float*)(boxes.buffer), (const float*)(scores.buffer), num_bboxes, detectionParams,
            networkInfo.width, networkInfo.height);

    objects.insert(objects.end(), outObjs.begin(), outObjs.end());

    objectList = objects;

    return true;
}

1. [DS5.0GA_Jetson_dGPU_Plugin] Measure of the FPS of pipeline

2. [DS5.0GA_Jetson_dGPU_Plugin] Dump the Inference Input

3. [DS5_Jetson_dGPU_Plugin] Dump the Inference outputs

  • apply Attached dump_dsinfer_raw_TRT_infer_outputs.txt (1.8 KB) into /opt/nvidia/deepstream/deepstream/sources/libs/nvdsinfer/
  • build libnvds_infer.so and replace /opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_infer.so

4. [DS5.0GA_Jetson_App] Rotate camera input image with NvBufSurfTransform() API

5. [DS5.0GA_App] Generate GStreamer Pipeline Graph
Use one of below method according to your application type to generate the GST pipeline graph.

4.1 deepstream-app
 run "export GST_DEBUG_DUMP_DOT_DIR=/tmp/" before deepstream-app command, e.g.
 $ sudo apt-get install graphviz
 $ export GST_DEBUG_DUMP_DOT_DIR=/tmp/
 $ deepstream-app -c deepstream_app_config_yoloV2.txt
 $ cd   /tmp/
 $ dot -Tpng 0.03.47.898178403-ds-app-playing.dot >~/0.03.47.898178403-ds-app-playing.png  // png file includes the graph

4.2 gstreamer command line
for exmaple,
  $ run "export GST_DEBUG_DUMP_DOT_DIR=/tmp/" before deepstream-app command, e.g.
  $ sudo apt-get install graphviz
  $ export GST_DEBUG_DUMP_DOT_DIR=/tmp/
  $ gst-launch-1.0 ....
  $  cd  /tmp/
  $ dot -Tpng 0.03.47.898178403-ds-app-playing.dot >~/0.03.47.898178403-ds-app-playing.png  // png file includes the graph

 4.3 DeepStream application
  for exmaple
  4.3.1 add "g_setenv("GST_DEBUG_DUMP_DOT_DIR", "/tmp", TRUE);" before  gst_init()
  4.3.2 add "GST_DEBUG_BIN_TO_DOT_FILE_WITH_TS(GST_BIN(gst_objs.pipeline), GST_DEBUG_GRAPH_SHOW_ALL, "demo-app-pipeline");" at the point where want to export the dot file, e.g. when switching to PLAYING
   BTW, need to include header file -   #include <gio/gio.h>

 4.4 Python DeepStream
  Refer to https://forums.developer.nvidia.com/t/python-deepstream-program-not-generating-dot-file/163837/8?u=mchi

6. [DS 5.0.1_All_Plugin] Tracker FAQ topic Deepstream Tracker FAQ

7. [DS 5.0GA_All_App] Enable Latency measurement for deepstream sample apps

  1. If you are using deepstream-app, to check the component latency directly, you need to set the env

    1. export NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT=1
      export NVDS_ENABLE_LATENCY_MEASUREMENT=1
  2. If you are using other deepstream sample apps such as deepstream-test3, you need to apply the following patch and set the env

    1. export NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT=1
      export NVDS_ENABLE_LATENCY_MEASUREMENT=1
diff --git a/apps/deepstream/sample_apps/deepstream-test3/deepstream_test3_app.c b/apps/deepstream/sample_apps/deepstream-test3/deepstream_test3_app.c
index 426bd69..c7c2472 100644
--- a/apps/deepstream/sample_apps/deepstream-test3/deepstream_test3_app.c
+++ b/apps/deepstream/sample_apps/deepstream-test3/deepstream_test3_app.c
@@ -26,6 +26,7 @@
 #include <math.h>
 #include <string.h>
 #include <sys/time.h>
+#include <stdlib.h>

 #include "gstnvdsmeta.h"
 //#include "gstnvstreammeta.h"
@@ -73,6 +74,41 @@ gchar pgie_classes_str[4][32] = { "Vehicle", "TwoWheeler", "Person",

 //static guint probe_counter = 0;

+typedef struct {
+  GMutex *lock;
+  int num_sources;
+}LatencyCtx;
+
+static GstPadProbeReturn
+latency_measurement_buf_prob(GstPad * pad, GstPadProbeInfo * info, gpointer u_data)
+{
+  LatencyCtx *ctx = (LatencyCtx *) u_data;
+  static int batch_num = 0;
+  guint i = 0, num_sources_in_batch = 0;
+  if(nvds_enable_latency_measurement)
+  {
+    GstBuffer *buf = (GstBuffer *) info->data;
+    NvDsFrameLatencyInfo *latency_info = NULL;
+    g_mutex_lock (ctx->lock);
+    latency_info = (NvDsFrameLatencyInfo *)
+      calloc(1, ctx->num_sources * sizeof(NvDsFrameLatencyInfo));;
+    g_print("\n************BATCH-NUM = %d**************\n",batch_num);
+    num_sources_in_batch = nvds_measure_buffer_latency(buf, latency_info);
+
+    for(i = 0; i < num_sources_in_batch; i++)
+    {
+      g_print("Source id = %d Frame_num = %d Frame latency = %lf (ms) \n",
+          latency_info[i].source_id,
+          latency_info[i].frame_num,
+          latency_info[i].latency);
+    }
+    g_mutex_unlock (ctx->lock);
+    batch_num++;
+  }
+
+  return GST_PAD_PROBE_OK;
+}
+
 /* tiler_sink_pad_buffer_probe  will extract metadata received on OSD sink pad
  * and update params for drawing rectangle, object information etc. */

@@ -107,9 +143,9 @@ tiler_src_pad_buffer_probe (GstPad * pad, GstPadProbeInfo * info,
                 num_rects++;
             }
         }
-          g_print ("Frame Number = %d Number of objects = %d "
-            "Vehicle Count = %d Person Count = %d\n",
-            frame_meta->frame_num, num_rects, vehicle_count, person_count);
+          // g_print ("Frame Number = %d Number of objects = %d "
+          //   "Vehicle Count = %d Person Count = %d\n",
+          //   frame_meta->frame_num, num_rects, vehicle_count, person_count);
 #if 0
         display_meta = nvds_acquire_display_meta_from_pool(batch_meta);
         NvOSD_TextParams *txt_params  = &display_meta->text_params;
@@ -383,7 +419,7 @@ main (int argc, char *argv[])
 #ifdef PLATFORM_TEGRA
   transform = gst_element_factory_make ("nvegltransform", "nvegl-transform");
 #endif
-  sink = gst_element_factory_make ("nveglglessink", "nvvideo-renderer");
+  sink = gst_element_factory_make ("fakesink", "nvvideo-renderer");

   if (!pgie || !tiler || !nvvidconv || !nvosd || !sink) {
     g_printerr ("One element could not be created. Exiting.\n");
@@ -467,6 +503,18 @@ gst_bin_add_many (GST_BIN (pipeline), queue1, pgie, queue2, tiler, queue3,
         tiler_src_pad_buffer_probe, NULL, NULL);
   gst_object_unref (tiler_src_pad);

+  GstPad *sink_pad =  gst_element_get_static_pad (nvosd, "src");
+  if (!sink_pad)
+    g_print ("Unable to get src pad\n");
+  else {
+    LatencyCtx *ctx = (LatencyCtx *)g_malloc0(sizeof(LatencyCtx));
+    ctx->lock = (GMutex *)g_malloc0(sizeof(GMutex));
+    ctx->num_sources = num_sources;
+    gst_pad_add_probe (sink_pad, GST_PAD_PROBE_TYPE_BUFFER,
+        latency_measurement_buf_prob, ctx, NULL);
+  }
+  gst_object_unref (sink_pad);
+
   /* Set the pipeline to "playing" state */
   g_print ("Now playing:");
   for (i = 0; i < num_sources; i++) {
Delay when I using RTSP camera
Latency measurement (nvds_measure_buffer_latency) gave weird results
Delay, randomness and dropped frames in RTSP output Stream
How to get the latency from deepstream python apps
Deepstream 6 python app performance degradation
Deepstream multiple rtsp output latency
Can deepstream handle higher resolutions than 1080p?
The most efficient method to evaluate time each plugin (in DeepStream)cost?
Inference with deepstream yolov5s-3.0 on 2 camera long delay (20-25s)
Unexpected FPS drop with back-to-back detector concept in deepstream-app
Does deepstream pipeline works sequentially?
How to get the latency from deepstream python apps
How to get the latency from deepstream python apps
How to get the latency from deepstream python apps
Print inference time in deepstream 5.1 on TX2NX
Print inference time in deepstream 5.1 on TX2NX
How to decrease the latency of pushing streaming to the local
Question about tensorRT batch size
Deepstream 6 python app performance degradation
Deepstream 6.0: Image capture to muxer large latency
How to accelerate single stream pipeline with batch size grater then 1
How to decrease the latency of pushing streaming to the local
DeepStream metrics
The deepstream-test3 demo using rtsp webcam delayed
Why my pipeline is stuck and delayed, but deepstream-app is very smooth?
Running deepstream-text1 on tx2 to load yolov5s engine model becomes very delayed
Running deepstream-text1 on tx2 to load yolov5s engine model becomes very delayed

8. [DS 5.0GA_All_App] Enable Perf measurement(FPS) for deepstream sample apps

  1. If you are using deepstream-app, you can add enable-perf-measurement=1 under Application Group in the config file
  2. If you are using other deepstream sample apps such as deepstream-test2, you can apply following patch to enable it
diff --git a/sources/apps/sample_apps/deepstream-test2/deepstream_test2_app.c b/sources/apps/sample_apps/deepstream-test2/deepstream_test2_app.c
index a2231acf535b4826adb766ed28f3aa80294c7f82..e37d7504ed07c9db77e5d3cdac2c4943fd0d1010 100755
--- a/sources/apps/sample_apps/deepstream-test2/deepstream_test2_app.c
+++ b/sources/apps/sample_apps/deepstream-test2/deepstream_test2_app.c
@@ -28,6 +28,7 @@
 #include <string.h>
 
 #include "gstnvdsmeta.h"
+#include "deepstream_perf.h"
 
 #define PGIE_CONFIG_FILE  "dstest2_pgie_config.txt"
 #define SGIE1_CONFIG_FILE "dstest2_sgie1_config.txt"
@@ -51,6 +52,29 @@
  * based on the fastest source's framerate. */
 #define MUXER_BATCH_TIMEOUT_USEC 40000
 
+#define MAX_STREAMS 64
+
+typedef struct
+{
+    /** identifies the stream ID */
+    guint32 stream_index;
+    gdouble fps[MAX_STREAMS];
+    gdouble fps_avg[MAX_STREAMS];
+    guint32 num_instances;
+    guint header_print_cnt;
+    GMutex fps_lock;
+    gpointer context;
+
+    /** Test specific info */
+    guint32 set_batch_size;
+}DemoPerfCtx;
+
+
+typedef struct {
+  GMutex *lock;
+  int num_sources;
+}LatencyCtx;
+
 gint frame_number = 0;
 /* These are the strings of the labels for the respective models */
 gchar sgie1_classes_str[12][32] = { "black", "blue", "brown", "gold", "green",
@@ -80,6 +104,66 @@ guint sgie1_unique_id = 2;
 guint sgie2_unique_id = 3;
 guint sgie3_unique_id = 4;
 
+/**
+ * callback function to print the performance numbers of each stream.
+ */
+static void
+perf_cb (gpointer context, NvDsAppPerfStruct * str)
+{
+  DemoPerfCtx *thCtx = (DemoPerfCtx *) context;
+
+  g_mutex_lock(&thCtx->fps_lock);
+  /** str->num_instances is == num_sources */
+  guint32 numf = str->num_instances;
+  guint32 i;
+
+  for (i = 0; i < numf; i++) {
+    thCtx->fps[i] = str->fps[i];
+    thCtx->fps_avg[i] = str->fps_avg[i];
+  }
+  thCtx->context = thCtx;
+  g_print ("**PERF: ");
+  for (i = 0; i < numf; i++) {
+    g_print ("%.2f (%.2f)\t", thCtx->fps[i], thCtx->fps_avg[i]);
+  }
+  g_print ("\n");
+  g_mutex_unlock(&thCtx->fps_lock);
+}
+
+/**
+ * callback function to print the latency of each component in the pipeline.
+ */
+
+static GstPadProbeReturn
+latency_measurement_buf_prob(GstPad * pad, GstPadProbeInfo * info, gpointer u_data)
+{
+  LatencyCtx *ctx = (LatencyCtx *) u_data;
+  static int batch_num = 0;
+  guint i = 0, num_sources_in_batch = 0;
+  if(nvds_enable_latency_measurement)
+  {
+    GstBuffer *buf = (GstBuffer *) info->data;
+    NvDsFrameLatencyInfo *latency_info = NULL;
+    g_mutex_lock (ctx->lock);
+    latency_info = (NvDsFrameLatencyInfo *)
+      calloc(1, ctx->num_sources * sizeof(NvDsFrameLatencyInfo));;
+    g_print("\n************BATCH-NUM = %d**************\n",batch_num);
+    num_sources_in_batch = nvds_measure_buffer_latency(buf, latency_info);
+
+    for(i = 0; i < num_sources_in_batch; i++)
+    {
+      g_print("Source id = %d Frame_num = %d Frame latency = %lf (ms) \n",
+          latency_info[i].source_id,
+          latency_info[i].frame_num,
+          latency_info[i].latency);
+    }
+    g_mutex_unlock (ctx->lock);
+    batch_num++;
+  }
+
+  return GST_PAD_PROBE_OK;
+}
+
 /* This is the buffer probe function that we have registered on the sink pad
  * of the OSD element. All the infer elements in the pipeline shall attach
  * their metadata to the GstBuffer, here we will iterate & process the metadata
@@ -144,9 +228,9 @@ osd_sink_pad_buffer_probe (GstPad * pad, GstPadProbeInfo * info,
         nvds_add_display_meta_to_frame(frame_meta, display_meta);
     }
 
-    g_print ("Frame Number = %d Number of objects = %d "
-            "Vehicle Count = %d Person Count = %d\n",
-            frame_number, num_rects, vehicle_count, person_count);
+    // g_print ("Frame Number = %d Number of objects = %d "
+    //         "Vehicle Count = %d Person Count = %d\n",
+    //         frame_number, num_rects, vehicle_count, person_count);
     frame_number++;
     return GST_PAD_PROBE_OK;
 }
@@ -586,6 +670,30 @@ main (int argc, char *argv[])
     gst_pad_add_probe (osd_sink_pad, GST_PAD_PROBE_TYPE_BUFFER,
         osd_sink_pad_buffer_probe, NULL, NULL);
 
+  GstPad *sink_pad =  gst_element_get_static_pad (nvvidconv1, "src");
+  if (!sink_pad)
+    g_print ("Unable to get sink pad\n");
+  else {
+    LatencyCtx *ctx = (LatencyCtx *)g_malloc0(sizeof(LatencyCtx));
+    ctx->lock = (GMutex *)g_malloc0(sizeof(GMutex));
+    ctx->num_sources = argc - 2;
+    gst_pad_add_probe (sink_pad, GST_PAD_PROBE_TYPE_BUFFER,
+        latency_measurement_buf_prob, ctx, NULL);
+  }
+  gst_object_unref (sink_pad);
+
+  GstPad *tiler_pad =  gst_element_get_static_pad (nvtiler, "sink");
+  if (!tiler_pad)
+    g_print ("Unable to get tiler_pad pad\n");
+  else {
+    NvDsAppPerfStructInt *str =  (NvDsAppPerfStructInt *)g_malloc0(sizeof(NvDsAppPerfStructInt));
+    DemoPerfCtx *perf_ctx = (DemoPerfCtx *)g_malloc0(sizeof(DemoPerfCtx));
+    g_mutex_init(&perf_ctx->fps_lock);
+    str->context = perf_ctx;
+    enable_perf_measurement (str, tiler_pad, argc-2, 1, 0, perf_cb);
+  }
+  gst_object_unref (tiler_pad);
+
   /* Set the pipeline to "playing" state */
   g_print ("Now playing: %s\n", argv[1]);
   gst_element_set_state (pipeline, GST_STATE_PLAYING);

9. [DS 5.0GA_Jetson_App] Capture HW & SW Memory Leak log
nvmemstat.py.txt (4.7 KB)

  1. Download attachment onto Jetson device and rename to nvmemstat.py
  2. Install “lsof” tool
    $ sudo apt-get install lsof
  3. Run your application on Jetson in one terminal or background
  4. Run this script with command :
    $ sudo ./nvmemstat.py -p PROGRAM_NAME // replace PROGRAM_NAME to application name in step#2
    this script will monitor the hardware memory, SW memory, etc.
  5. Share the log on the topic for further triage

10. [ALL_Jetson_plugin] Jetson GStreamer Plugins Using with DeepStream
For the user of Jetson DeepStream (JetPack), there are some accelerated gstreamer plugins which is hardware accelerated by Jetson but are not listed in DeepStream plugin list GStreamer Plugin Overview — DeepStream 6.1.1 Release documentation.

Some of these plugins can be used in the DeepStream pipeline to extend the DeepStream functions while some of them are not compatible to DeepStreamSDK.

The basic document for the Gstreamer accelerated plugins is Multimedia — Jetson Linux
Developer Guide 34.1 documentation (nvidia.com)

DeepStream compatible plugins:

  • nvegltransform: NvEGLTransform

Typical usage:

gst-launch-1.0 uridecodebin uri=file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4 ! m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720 ! nvinfer config-file-path=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_infer_primary.txt ! nvtracker tracker-width=640 tracker-height=480 ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so ll-config-file=config_tracker_NvDCF_perf.yml enable-batch-process=1 ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! nvmultistreamtiler ! nvdsosd ! nvvideoconvert ! nvegltransform ! nveglglessink

  • nvarguscamerasrc: nvarguscamerasrc: NvArgusCameraSrc

Typical usage:

gst-launch-1.0 nvarguscamerasrc bufapi-version=true sensor-id=0 ! ‘video/x-raw(memory:NVMM),width=640,height=480,framerate=30/1,format=NV12’ ! m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720 ! nvinfer config-file-path=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_infer_primary.txt ! nvtracker tracker-width=640 tracker-height=480 ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so ll-config-file=config_tracker_NvDCF_perf.yml enable-batch-process=1 ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! nvmultistreamtiler ! nvdsosd ! nvvideoconvert ! nvegltransform ! nveglglessink

The related topic in forum:

Segfault when nvvideoconvert and nvv4l2h265enc are used together - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

  • nvv4l2camerasrc: nvv4l2camerasrc: NvV4l2CameraSrc

Typical usage:

gst-launch-1.0 nvv4l2camerasrc device=/dev/video0 bufapi-version=1 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=60/1' ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=NV12' ! mx.sink_0 nvv4l2camerasrc device=/dev/video1 bufapi-version=1 ! 'video/x-raw(memory:NVMM),width=1920,height=1080,framerate=60/1' ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=NV12' ! mx.sink_1 nvstreammux width=1920 height=1080 batch-size=2 live-source=1 name=mx ! nvinfer config-file-path=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_infer_primary.txt batch-size=2 ! nvvideoconvert ! nvmultistreamtiler width=1920 height=1080 rows=1 columns=2 ! nvvideoconvert ! nvdsosd ! nvegltransform ! nveglglessink sync=0

The related topic in forum:
Low camera frame rate - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

  • nvdrmvideosink: Nvidia Drm Video Sink

Typical pipeline:
gst-launch-1.0 filesrc location=/opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! nvvideoconvert ! m.sink_0 nvstreammux name=m batch-size=1 width=1920 height=1080 ! nvinfer config-file-path= /opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_infer_primary.txt ! nvdrmvideosink conn_id=0 plane_id=1 set_mode=0 -e

The related topic in forum:
Which videosink for Jetson TX2 in EGLFS? - Jetson & Embedded Systems / Jetson TX2 - NVIDIA Developer Forums

  • nv3dsink: Nvidia 3D sink

Typical pipeline:
gst-launch-1.0 filesrc location=/opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! nvvideoconvert ! m.sink_0 nvstreammux name=m batch-size=1 width=1920 height=1080 ! nvinfer config-file-path= /opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_infer_primary.txt ! nv3dsink sync=false

Note: The nv3dsink plugin is a window-based rendering sink, and based on X11.

  • nvoverlaysink: OpenMax Video Sink

Typical pipeline:

gst-launch-1.0 filesrc location=/opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4 ! qtdemux ! h264parse ! nvv4l2decoder bufapi-version=1 ! nvvideoconvert ! m.sink_0 nvstreammux name=m batch-size=1 width=1920 height=1080 ! nvinfer config-file-path= /opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_infer_primary.txt ! nvoverlaysink sync=0

Note:The nvoverlaysink plugin is deprecated in L4T release 32.1. Please use nvdrmvideosink or nv3dsink for rendering gst-v4l2 decoder output.

DeepStream Incompatible Plugins

Typical pipeline:
gst-launch-1.0 nvcompositor name=comp sink_0::xpos=0 sink_0::ypos=0 sink_0::width=960 sink_0::height=540 sink_1::xpos=960 sink_1::ypos=0 sink_1::width=960 sink_1::height=540 sink_2::xpos=0 sink_2::ypos=540 sink_2::width=1920 sink_2::height=540 ! nvegltransform ! nveglglessink \ filesrc location=/opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! comp. \ filesrc location=/opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! comp. \ filesrc location=/opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! comp. -e

The related topic in forum:
How to Customize layout from Nvmultistream-tiler module from DeepStream - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

11. [DS 5.x_All_App] How to implement a python binding

Refer following samples from forum users:
https://github.com/mrtj/pyds_tracker_meta
https://github.com/7633/pyds_analytics_meta

12. [DS 5.0GA_Jetson_App]: Dump NV12 NvBufSurface into a YUV file
Each NV12 NvBufSurface includes two semi-planes which are not continuous in memory.
gstnvinfer_dump_NV12_NvBufSurface.patch (4.9 KB)

This is a sample change to /opt/nvidia/deepstream/deepstream-5.1/sources/gst-plugins/gst-nvinfer/gstnvinfer.cpp to dump the NV12 NvBufSurface before transforming to RGB data.
After getting the YUV file, we can view it in https://rawpixels.net/ as below

13. [DS 5.x_All_App] How to access and modify the NvBufSurface

Refer Deepstream sample code snippet - #3 by bcao

14. [All_Jetson_App] Check memory leakage with valgrind

  1. Install valgrind with below command
    $ sudo apt-get install valgrind valgrind-dbg
  2. Run application with below command
    $ valgrind --tool=memcheck --leak-check=full --num-callers=100 --show-leak-kinds=definite,indirect --track-origins=yes ./app
  1. [DSx_All_App] Debug Tips for DeepStream Accuracy Issue
    15.1 Confirm your model has got good accuracy in training and inference outside DeepStream
    15.2 When deploying a ONNX model to DeepStream with nvinfer plugin, confirm if below nvinfer parameters are set correctly
    15.2.1 Input scale & offset
    1). net-scale-factor =
    2). offsets
    The usage of these two parameters are as below (from doc)


    15.2.2 Input Order
    1). network-input-order= // 0:NCHW 1:NHWC
    2). infer-dims= // if network-input-order=1, i.e. NHWC, infer-dims must be specified, otherwise, nvinfer can’t detect input dims automatically
    3). model-color-format= // 0: RGB 1: BGR 2: GRAY
    15.2.3 scale and padding
    1). maintain-aspect-ratio= // whether to maintain aspect ratio while scaling input
    2). symmetric-padding= // whether to pad image symmetrically while scaling input. By defaulut, it’s asymmetrical padding and the image will be scaled to top left corner.
    15.2.4 inference precision
    1). network-mode= // 0: FP32 1: INT8 2: FP16. If INT8 accuracy is not good, try FP16 or FP32
    15.2.5 threshold
    1). threshold=
    2). pre-cluster-threshold=
    3). Post-cluster-threshold=
    Above are some highlighted parameters for a quick check for accuracy. For more detailed informantion, please refer to nvinfer doc - Gst-nvinfer — DeepStream 6.1.1 Release documentation
    15.3 Dump the input or output of the nvinfer
    Below two items in DeepStream SDK FAQ - #9 by mchi
    2. [DS5.0GA_Jetson_dGPU_Plugin] Dump the Inference Input ==> compare the input between DS and your own standalone inference/training app
    3. [DS5_Jetson_dGPU_Plugin] Dump the Inference outputs ==> then apply your own parser offline check this output data

16. [DeepStream 6.0 GA] python binding installation

Download the wheel files directly from Releases · NVIDIA-AI-IOT/deepstream_python_apps · GitHub
Or build it referring to steps below:

16.1 dGPU+x86 platform & Triton docker

[DeepStream 6.0] Unable to install python_gst into nvcr.io/nvidia/deepstream:6.0-triton container - #5 by rpaliwal_nvidia

16.2 dGPU+x86 platform & non-Triton docker

  please refer to deepstream_python_apps/bindings at master · NVIDIA-AI-IOT/deepstream_python_apps · GitHub and steps below if you use DS6,0GA docker -
## 1.  Prerequisites
apt install -y git python-dev python3 python3-pip python3.6-dev python3.8-dev cmake g++ build-essential \
    libglib2.0-dev libglib2.0-dev-bin python-gi-dev libtool m4 autoconf automake

# 2. Gst-python
cd /opt/nvidia/deepstream/deepstream/sources/apps/
git clone https://github.com/NVIDIA-AI-IOT/deepstream_python_apps.git
cd deepstream_python_apps/
git submodule update --init
apt-get install --reinstall ca-certificates
cd 3rdparty/gst-python/
./autogen.sh
make && make install

# 3. install pyds
cd deepstream_python_apps/bindings/
mkdir build
cd build
cmake ..
make
pip3 install ./pyds-1.1.0-py3-none-linux_x86_64.whl

# 4. run sample
cd deepstream_python_apps
mv  apps/* ./
cd deepstream-test1/
python3 deepstream_test_1.py ../../../../samples/streams/sample_qHD.h264
![image|690x361](upload://yKIofGABfyeSYJKEdsr1j5OFOI2.png)

16.3 Jetson dockers

Rrefer to deepstream_python_apps/bindings at master · NVIDIA-AI-IOT/deepstream_python_apps · GitHub and steps below if you use DS6,0GA docker -

## 1.  Prerequisites
apt-get update
apt install -y git python-dev python3 python3-pip python3.6-dev python3.8-dev cmake g++ build-essential \
    libglib2.0-dev libglib2.0-dev-bin python-gi-dev libtool m4 autoconf automake

# 2. Gst-python
cd /opt/nvidia/deepstream/deepstream/sources/apps/
git clone https://github.com/NVIDIA-AI-IOT/deepstream_python_apps.git
cd deepstream_python_apps/
git submodule update --init
apt-get install --reinstall ca-certificates
cd 3rdparty/gst-python/
./autogen.sh
make && make install

# 3. install pyds
cd deepstream_python_apps/bindings/
mkdir build
cd build
cmake ..  -DPYTHON_MAJOR_VERSION=3 -DPYTHON_MINOR_VERSION=6 -DPIP_PLATFORM=linux_aarch64 -DDS_PATH=/opt/nvidia/deepstream/deepstream
make
pip3 install ./pyds-1.1.0-py3-none-linux_aarch64.whl

# 4. run sample
cd deepstream_python_apps
mv  apps/* ./
cd deepstream-test1/
python3 deepstream_test_1.py ../../../../samples/streams/sample_qHD.h264

17.[DeepStream_dGPU_App] Using OpenCV to run deepstream pipeline

Sometimes the gstreamer pipeline in opencv will fail. Please refer to the following topic to resolve this problem.

How to compile OpenCV with Gstreamer [Ubuntu&Windows] | by Galaktyk 01 | Medium

18. Open model deployment on DeepStream (Thanks for the sharing!)
Yolo2/3/4/5/OR : Improved DeepStream for YOLO models (Thanks @marcoslucianops )
YoloV4 : GitHub - NVIDIA-AI-IOT/yolo_deepstream + deepstream_yolov4.tgz - Google Drive
YoloV4+dspreprocess : deepstream_yolov4_with_nvdspreprocess.tgz - Google Drive
YoloV5 + nvinfer : GitHub - beyondli/Yolo_on_Jetson
Yolov5-small : Custom Yolov5 on Deepstream 6.0 (Thanks @raghavendra.ramya)
YoloV5+Triton : Triton Inference through docker - #7 by mchi
YoloV5_gpu_optimization: GitHub - NVIDIA-AI-IOT/yolov5_gpu_optimization: This repository provides YOLOV5 GPU optimization sample
YoloV7: GitHub - NVIDIA-AI-IOT/yolo_deepstream
YoloV7+Triton: Deepstream / Triton Server - YOLOV7(Thanks @Levi_Pereira )
YoloV7+nvinfer: Tutorial: How to run YOLOv7 on Deepstream(Thanks @vcmike )

19. [DSx_All_App] How to use classification model as pgie?
The input is a blue car picture, we want to get the blue label, here is the test command:
blueCar.zip (37.6 KB)
dstest_appsrc_config.txt (3.7 KB)

gst-launch-1.0 filesrc location=blueCar.jpg ! jpegdec ! videoconvert ! video/x-raw,format=I420 ! nvvideoconvert ! video/x-raw\(memory:NVMM\),format=NV12 ! mux.sink_0 nvstreammux name=mux batch-size=1 width=1280 height=720 ! nvinfer config-file-path=./dstest_appsrc_config.txt ! nvvideoconvert ! video/x-raw\(memory:NVMM\),format=RGBA ! nvdsosd ! nvvideoconvert ! video/x-raw,format=I420 ! jpegenc ! filesink location=out.jpg

[Access output of Primary Classifier]
[Resnet50 with imagenet dataset image classification using deepstream sdk]