DeepStream SDK FAQ

DeepStream 5.0 Manual for YoloV4

  • The original Yolo implementation via CUDA kernel in DeepStream is based on old Yolo models (v2, v3) so it may not suit new Yolo models like YoloV4. Location: /opt/nvidia/deepstream/deepstream-5.0/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/kernels.cu

  • We are trying to embed Yolo layer into tensorRT engine while converting darknet or pytorch into engine, this is before deploying to DeepStream. This new solution would cause the old Yolo cuda kernel in DeepStream no longer to be used.

You can try following steps to make DeepStream working for YoloV4:

  1. go to https://github.com/Tianxiaomo/pytorch-YOLOv4 to generate a TensorRT engine according to this workflow: DarkNet or Pytorch --> ONNX --> TensorRT.
  2. Add following C++ functions into objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp and rebuild libnvdsinfer_custom_impl_Yolo.so
  3. Here are configuration files for you as references (You have to update a little to suit your environment):
    config_infer_primary_yoloV4.txt (3.4 KB)
    deepstream_app_config_yoloV4.txt (3.8 KB)
static NvDsInferParseObjectInfo convertBBoxYoloV4(const float& bx1, const float& by1, const float& bx2,
                                     const float& by2, const uint& netW, const uint& netH)
{
    NvDsInferParseObjectInfo b;
    // Restore coordinates to network input resolution

    float x1 = bx1 * netW;
    float y1 = by1 * netH;
    float x2 = bx2 * netW;
    float y2 = by2 * netH;

    x1 = clamp(x1, 0, netW);
    y1 = clamp(y1, 0, netH);
    x2 = clamp(x2, 0, netW);
    y2 = clamp(y2, 0, netH);

    b.left = x1;
    b.width = clamp(x2 - x1, 0, netW);
    b.top = y1;
    b.height = clamp(y2 - y1, 0, netH);

    return b;
}

static void addBBoxProposalYoloV4(const float bx, const float by, const float bw, const float bh,
                     const uint& netW, const uint& netH, const int maxIndex,
                     const float maxProb, std::vector<NvDsInferParseObjectInfo>& binfo)
{
    NvDsInferParseObjectInfo bbi = convertBBoxYoloV4(bx, by, bw, bh, netW, netH);
    if (bbi.width < 1 || bbi.height < 1) return;

    bbi.detectionConfidence = maxProb;
    bbi.classId = maxIndex;
    binfo.push_back(bbi);
}

static std::vector<NvDsInferParseObjectInfo>
decodeYoloV4Tensor(
    const float* boxes, const float* scores,
    const uint num_bboxes, NvDsInferParseDetectionParams const& detectionParams,
    const uint& netW, const uint& netH)
{
    std::vector<NvDsInferParseObjectInfo> binfo;

    uint bbox_location = 0;
    uint score_location = 0;
    for (uint b = 0; b < num_bboxes; ++b)
    {
        float bx1 = boxes[bbox_location];
        float by1 = boxes[bbox_location + 1];
        float bx2 = boxes[bbox_location + 2];
        float by2 = boxes[bbox_location + 3];

        float maxProb = 0.0f;
        int maxIndex = -1;

        for (uint c = 0; c < detectionParams.numClassesConfigured; ++c)
        {
            float prob = scores[score_location + c];
            if (prob > maxProb)
            {
                maxProb = prob;
                maxIndex = c;
            }
        }

        if (maxProb > detectionParams.perClassPreclusterThreshold[maxIndex])
        {
            addBBoxProposalYoloV4(bx1, by1, bx2, by2, netW, netH, maxIndex, maxProb, binfo);
        }

        bbox_location += 4;
        score_location += detectionParams.numClassesConfigured;
    }

    return binfo;
}

extern "C" bool NvDsInferParseCustomYoloV4(
    std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
    NvDsInferNetworkInfo const& networkInfo,
    NvDsInferParseDetectionParams const& detectionParams,
    std::vector<NvDsInferParseObjectInfo>& objectList)
{
    if (NUM_CLASSES_YOLO != detectionParams.numClassesConfigured)
    {
        std::cerr << "WARNING: Num classes mismatch. Configured:"
                  << detectionParams.numClassesConfigured
                  << ", detected by network: " << NUM_CLASSES_YOLO << std::endl;
    }

    std::vector<NvDsInferParseObjectInfo> objects;

    const NvDsInferLayerInfo &boxes = outputLayersInfo[0]; // num_boxes x 4
    const NvDsInferLayerInfo &scores = outputLayersInfo[1]; // num_boxes x num_classes

    // 3 dimensional: [num_boxes, 1, 4]
    assert(boxes.inferDims.numDims == 3);
    // 2 dimensional: [num_boxes, num_classes]
    assert(scores.inferDims.numDims == 2);

    // The second dimension should be num_classes
    assert(detectionParams.numClassesConfigured == scores.inferDims.d[1]);
    
    uint num_bboxes = boxes.inferDims.d[0];

    // std::cout << "Network Info: " << networkInfo.height << "  " << networkInfo.width << std::endl;

    std::vector<NvDsInferParseObjectInfo> outObjs =
        decodeYoloV4Tensor(
            (const float*)(boxes.buffer), (const float*)(scores.buffer), num_bboxes, detectionParams,
            networkInfo.width, networkInfo.height);

    objects.insert(objects.end(), outObjs.begin(), outObjs.end());

    objectList = objects;

    return true;
}

1. [DS5.0GA_Jetson_dGPU_Plugin] Measure of the FPS of pipeline

2. [DS5.0GA_Jetson_dGPU_Plugin] Dump the Inference Input

3. [DS5_Jetson_dGPU_Plugin] Dump the Inference outputs

  • apply Attached dump_dsinfer_raw_TRT_infer_outputs.txt (1.8 KB) into /opt/nvidia/deepstream/deepstream/sources/libs/nvdsinfer/
  • build libnvds_infer.so and replace /opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_infer.so

4. [DS5.0GA_Jetson_App] Rotate camera input image with NvBufSurfTransform() API

5. [DS5.0GA_App] Generate GStreamer Pipeline Graph
Use one of below method according to your application type to generate the GST pipeline graph.

4.1 deepstream-app
 run "export GST_DEBUG_DUMP_DOT_DIR=/tmp/" before deepstream-app command, e.g.
 $ sudo apt-get install graphviz
 $ export GST_DEBUG_DUMP_DOT_DIR=/tmp/
 $ deepstream-app -c deepstream_app_config_yoloV2.txt
 $ cd   /tmp/
 $ dot -Tpng 0.03.47.898178403-ds-app-playing.dot >~/0.03.47.898178403-ds-app-playing.png  // png file includes the graph

4.2 gstreamer command line
for exmaple,
  $ run "export GST_DEBUG_DUMP_DOT_DIR=/tmp/" before deepstream-app command, e.g.
  $ sudo apt-get install graphviz
  $ export GST_DEBUG_DUMP_DOT_DIR=/tmp/
  $ gst-launch-1.0 ....
  $  cd  /tmp/
  $ dot -Tpng 0.03.47.898178403-ds-app-playing.dot >~/0.03.47.898178403-ds-app-playing.png  // png file includes the graph

 4.3 DeepStream application
  for exmaple
  4.3.1 add "g_setenv("GST_DEBUG_DUMP_DOT_DIR", "/tmp", TRUE);" before  gst_init()
  4.3.2 add "GST_DEBUG_BIN_TO_DOT_FILE_WITH_TS(GST_BIN(gst_objs.pipeline), GST_DEBUG_GRAPH_SHOW_ALL, "demo-app-pipeline");" at the point where want to export the dot file, e.g. when switching to PLAYING
   BTW, need to include header file -   #include <gio/gio.h>

 4.4 Python DeepStream
  Refer to https://forums.developer.nvidia.com/t/python-deepstream-program-not-generating-dot-file/163837/8?u=mchi

6. [DS 5.0.1_All_Plugin] Tracker FAQ topic Deepstream Tracker FAQ

7. [DS 5.0GA_All_App] Enable Latency measurement for deepstream sample apps

  1. If you are using deepstream-app, to check the component latency directly, you need to set the env

    1. export NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT=1
      export NVDS_ENABLE_LATENCY_MEASUREMENT=1
  2. If you are using other deepstream sample apps such as deepstream-test3, you need to apply the following patch and set the env

    1. export NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT=1
      export NVDS_ENABLE_LATENCY_MEASUREMENT=1
diff --git a/apps/deepstream/sample_apps/deepstream-test3/deepstream_test3_app.c b/apps/deepstream/sample_apps/deepstream-test3/deepstream_test3_app.c
index 426bd69..c7c2472 100644
--- a/apps/deepstream/sample_apps/deepstream-test3/deepstream_test3_app.c
+++ b/apps/deepstream/sample_apps/deepstream-test3/deepstream_test3_app.c
@@ -26,6 +26,7 @@
 #include <math.h>
 #include <string.h>
 #include <sys/time.h>
+#include <stdlib.h>

 #include "gstnvdsmeta.h"
 //#include "gstnvstreammeta.h"
@@ -73,6 +74,41 @@ gchar pgie_classes_str[4][32] = { "Vehicle", "TwoWheeler", "Person",

 //static guint probe_counter = 0;

+typedef struct {
+  GMutex *lock;
+  int num_sources;
+}LatencyCtx;
+
+static GstPadProbeReturn
+latency_measurement_buf_prob(GstPad * pad, GstPadProbeInfo * info, gpointer u_data)
+{
+  LatencyCtx *ctx = (LatencyCtx *) u_data;
+  static int batch_num = 0;
+  guint i = 0, num_sources_in_batch = 0;
+  if(nvds_enable_latency_measurement)
+  {
+    GstBuffer *buf = (GstBuffer *) info->data;
+    NvDsFrameLatencyInfo *latency_info = NULL;
+    g_mutex_lock (ctx->lock);
+    latency_info = (NvDsFrameLatencyInfo *)
+      calloc(1, ctx->num_sources * sizeof(NvDsFrameLatencyInfo));;
+    g_print("\n************BATCH-NUM = %d**************\n",batch_num);
+    num_sources_in_batch = nvds_measure_buffer_latency(buf, latency_info);
+
+    for(i = 0; i < num_sources_in_batch; i++)
+    {
+      g_print("Source id = %d Frame_num = %d Frame latency = %lf (ms) \n",
+          latency_info[i].source_id,
+          latency_info[i].frame_num,
+          latency_info[i].latency);
+    }
+    g_mutex_unlock (ctx->lock);
+    batch_num++;
+  }
+
+  return GST_PAD_PROBE_OK;
+}
+
 /* tiler_sink_pad_buffer_probe  will extract metadata received on OSD sink pad
  * and update params for drawing rectangle, object information etc. */

@@ -107,9 +143,9 @@ tiler_src_pad_buffer_probe (GstPad * pad, GstPadProbeInfo * info,
                 num_rects++;
             }
         }
-          g_print ("Frame Number = %d Number of objects = %d "
-            "Vehicle Count = %d Person Count = %d\n",
-            frame_meta->frame_num, num_rects, vehicle_count, person_count);
+          // g_print ("Frame Number = %d Number of objects = %d "
+          //   "Vehicle Count = %d Person Count = %d\n",
+          //   frame_meta->frame_num, num_rects, vehicle_count, person_count);
 #if 0
         display_meta = nvds_acquire_display_meta_from_pool(batch_meta);
         NvOSD_TextParams *txt_params  = &display_meta->text_params;
@@ -383,7 +419,7 @@ main (int argc, char *argv[])
 #ifdef PLATFORM_TEGRA
   transform = gst_element_factory_make ("nvegltransform", "nvegl-transform");
 #endif
-  sink = gst_element_factory_make ("nveglglessink", "nvvideo-renderer");
+  sink = gst_element_factory_make ("fakesink", "nvvideo-renderer");

   if (!pgie || !tiler || !nvvidconv || !nvosd || !sink) {
     g_printerr ("One element could not be created. Exiting.\n");
@@ -467,6 +503,18 @@ gst_bin_add_many (GST_BIN (pipeline), queue1, pgie, queue2, tiler, queue3,
         tiler_src_pad_buffer_probe, NULL, NULL);
   gst_object_unref (tiler_src_pad);

+  GstPad *sink_pad =  gst_element_get_static_pad (nvosd, "src");
+  if (!sink_pad)
+    g_print ("Unable to get src pad\n");
+  else {
+    LatencyCtx *ctx = (LatencyCtx *)g_malloc0(sizeof(LatencyCtx));
+    ctx->lock = (GMutex *)g_malloc0(sizeof(GMutex));
+    ctx->num_sources = num_sources;
+    gst_pad_add_probe (sink_pad, GST_PAD_PROBE_TYPE_BUFFER,
+        latency_measurement_buf_prob, ctx, NULL);
+  }
+  gst_object_unref (sink_pad);
+
   /* Set the pipeline to "playing" state */
   g_print ("Now playing:");
   for (i = 0; i < num_sources; i++) {

8. [DS 5.0GA_All_App] Enable Perf measurement(FPS) for deepstream sample apps

  1. If you are using deepstream-app, you can add enable-perf-measurement=1 under Application Group in the config file
  2. If you are using other deepstream sample apps such as deepstream-test2, you can apply following patch to enable it
diff --git a/sources/apps/sample_apps/deepstream-test2/deepstream_test2_app.c b/sources/apps/sample_apps/deepstream-test2/deepstream_test2_app.c
index a2231acf535b4826adb766ed28f3aa80294c7f82..e37d7504ed07c9db77e5d3cdac2c4943fd0d1010 100755
--- a/sources/apps/sample_apps/deepstream-test2/deepstream_test2_app.c
+++ b/sources/apps/sample_apps/deepstream-test2/deepstream_test2_app.c
@@ -28,6 +28,7 @@
 #include <string.h>
 
 #include "gstnvdsmeta.h"
+#include "deepstream_perf.h"
 
 #define PGIE_CONFIG_FILE  "dstest2_pgie_config.txt"
 #define SGIE1_CONFIG_FILE "dstest2_sgie1_config.txt"
@@ -51,6 +52,29 @@
  * based on the fastest source's framerate. */
 #define MUXER_BATCH_TIMEOUT_USEC 40000
 
+#define MAX_STREAMS 64
+
+typedef struct
+{
+    /** identifies the stream ID */
+    guint32 stream_index;
+    gdouble fps[MAX_STREAMS];
+    gdouble fps_avg[MAX_STREAMS];
+    guint32 num_instances;
+    guint header_print_cnt;
+    GMutex fps_lock;
+    gpointer context;
+
+    /** Test specific info */
+    guint32 set_batch_size;
+}DemoPerfCtx;
+
+
+typedef struct {
+  GMutex *lock;
+  int num_sources;
+}LatencyCtx;
+
 gint frame_number = 0;
 /* These are the strings of the labels for the respective models */
 gchar sgie1_classes_str[12][32] = { "black", "blue", "brown", "gold", "green",
@@ -80,6 +104,66 @@ guint sgie1_unique_id = 2;
 guint sgie2_unique_id = 3;
 guint sgie3_unique_id = 4;
 
+/**
+ * callback function to print the performance numbers of each stream.
+ */
+static void
+perf_cb (gpointer context, NvDsAppPerfStruct * str)
+{
+  DemoPerfCtx *thCtx = (DemoPerfCtx *) context;
+
+  g_mutex_lock(&thCtx->fps_lock);
+  /** str->num_instances is == num_sources */
+  guint32 numf = str->num_instances;
+  guint32 i;
+
+  for (i = 0; i < numf; i++) {
+    thCtx->fps[i] = str->fps[i];
+    thCtx->fps_avg[i] = str->fps_avg[i];
+  }
+  thCtx->context = thCtx;
+  g_print ("**PERF: ");
+  for (i = 0; i < numf; i++) {
+    g_print ("%.2f (%.2f)\t", thCtx->fps[i], thCtx->fps_avg[i]);
+  }
+  g_print ("\n");
+  g_mutex_unlock(&thCtx->fps_lock);
+}
+
+/**
+ * callback function to print the latency of each component in the pipeline.
+ */
+
+static GstPadProbeReturn
+latency_measurement_buf_prob(GstPad * pad, GstPadProbeInfo * info, gpointer u_data)
+{
+  LatencyCtx *ctx = (LatencyCtx *) u_data;
+  static int batch_num = 0;
+  guint i = 0, num_sources_in_batch = 0;
+  if(nvds_enable_latency_measurement)
+  {
+    GstBuffer *buf = (GstBuffer *) info->data;
+    NvDsFrameLatencyInfo *latency_info = NULL;
+    g_mutex_lock (ctx->lock);
+    latency_info = (NvDsFrameLatencyInfo *)
+      calloc(1, ctx->num_sources * sizeof(NvDsFrameLatencyInfo));;
+    g_print("\n************BATCH-NUM = %d**************\n",batch_num);
+    num_sources_in_batch = nvds_measure_buffer_latency(buf, latency_info);
+
+    for(i = 0; i < num_sources_in_batch; i++)
+    {
+      g_print("Source id = %d Frame_num = %d Frame latency = %lf (ms) \n",
+          latency_info[i].source_id,
+          latency_info[i].frame_num,
+          latency_info[i].latency);
+    }
+    g_mutex_unlock (ctx->lock);
+    batch_num++;
+  }
+
+  return GST_PAD_PROBE_OK;
+}
+
 /* This is the buffer probe function that we have registered on the sink pad
  * of the OSD element. All the infer elements in the pipeline shall attach
  * their metadata to the GstBuffer, here we will iterate & process the metadata
@@ -144,9 +228,9 @@ osd_sink_pad_buffer_probe (GstPad * pad, GstPadProbeInfo * info,
         nvds_add_display_meta_to_frame(frame_meta, display_meta);
     }
 
-    g_print ("Frame Number = %d Number of objects = %d "
-            "Vehicle Count = %d Person Count = %d\n",
-            frame_number, num_rects, vehicle_count, person_count);
+    // g_print ("Frame Number = %d Number of objects = %d "
+    //         "Vehicle Count = %d Person Count = %d\n",
+    //         frame_number, num_rects, vehicle_count, person_count);
     frame_number++;
     return GST_PAD_PROBE_OK;
 }
@@ -586,6 +670,30 @@ main (int argc, char *argv[])
     gst_pad_add_probe (osd_sink_pad, GST_PAD_PROBE_TYPE_BUFFER,
         osd_sink_pad_buffer_probe, NULL, NULL);
 
+  GstPad *sink_pad =  gst_element_get_static_pad (nvvidconv1, "src");
+  if (!sink_pad)
+    g_print ("Unable to get sink pad\n");
+  else {
+    LatencyCtx *ctx = (LatencyCtx *)g_malloc0(sizeof(LatencyCtx));
+    ctx->lock = (GMutex *)g_malloc0(sizeof(GMutex));
+    ctx->num_sources = argc - 2;
+    gst_pad_add_probe (sink_pad, GST_PAD_PROBE_TYPE_BUFFER,
+        latency_measurement_buf_prob, ctx, NULL);
+  }
+  gst_object_unref (sink_pad);
+
+  GstPad *tiler_pad =  gst_element_get_static_pad (nvtiler, "sink");
+  if (!tiler_pad)
+    g_print ("Unable to get tiler_pad pad\n");
+  else {
+    NvDsAppPerfStructInt *str =  (NvDsAppPerfStructInt *)g_malloc0(sizeof(NvDsAppPerfStructInt));
+    DemoPerfCtx *perf_ctx = (DemoPerfCtx *)g_malloc0(sizeof(DemoPerfCtx));
+    g_mutex_init(&perf_ctx->fps_lock);
+    str->context = perf_ctx;
+    enable_perf_measurement (str, tiler_pad, argc-2, 1, 0, perf_cb);
+  }
+  gst_object_unref (tiler_pad);
+
   /* Set the pipeline to "playing" state */
   g_print ("Now playing: %s\n", argv[1]);
   gst_element_set_state (pipeline, GST_STATE_PLAYING);

9. [DS 5.0GA_Jetson_App] Capture HW & SW Memory Leak log
nvmemstat.py.txt (4.7 KB)

  1. Download attachment onto Jetson device and rename to nvmemstat.py
  2. Install “lsof” tool
    $ sudo apt-get install lsof
  3. Run your application on Jetson in one terminal or background
  4. Run this script with command :
    $ sudo ./nvmemstat.py -p PROGRAM_NAME // replace PROGRAM_NAME to application name in step#2
    this script will monitor the hardware memory, SW memory, etc.
  5. Share the log on the topic for further triage

10. [ALL_Jetson_plugin] Jetson GStreamer Plugins Using with DeepStream
For the user of Jetson DeepStream (JetPack), there are some accelerated gstreamer plugins which is hardware accelerated by Jetson but are not listed in DeepStream plugin list GStreamer Plugin Overview — DeepStream 5.1 Release documentation.

Some of these plugins can be used in the DeepStream pipeline to extend the DeepStream functions while some of them are not compatible to DeepStreamSDK.

The basic document for the Gstreamer accelerated plugins is Tegra Linux Driver

DeepStream compatible plugins:

  • nvegltransform: NvEGLTransform

Typical usage:

gst-launch-1.0 uridecodebin uri=file:///opt/nvidia/deepstream/deepstream-5.0/samples/streams/sample_1080p_h264.mp4 ! m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720 ! nvinfer config-file-path=/opt/nvidia/deepstream/deepstream-5.0/samples/configs/deepstream-app/config_infer_primary.txt ! nvtracker tracker-width=640 tracker-height=480 ll-lib-file=/opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_mot_klt.so enable-batch-process=1 ! nvvideoconvert ! ‘video/x-raw(memory:NVMM),format=RGBA’ ! nvmultistreamtiler ! nvdsosd ! nvvideoconvert ! nvegltransform ! nveglglessink

  • nvarguscamerasrc: nvarguscamerasrc: NvArgusCameraSrc

Typical usage:

Gst-launch-1.0 nvarguscamerasrc bufapi-version=true sensor-id=0 ! ‘video/x-raw(memory:NVMM),width=640,height=480,framerate=30/1,format=NV12’ ! m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720 ! nvinfer config-file-path=/opt/nvidia/deepstream/deepstream-5.0/samples/configs/deepstream-app/config_infer_primary.txt ! nvtracker tracker-width=640 tracker-height=480 ll-lib-file=/opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_mot_klt.so enable-batch-process=1 ! nvvideoconvert ! ‘video/x-raw(memory:NVMM),format=RGBA’ ! nvmultistreamtiler ! nvdsosd ! nvvideoconvert ! nvegltransform ! nveglglessink

The related topic in forum:

Segfault when nvvideoconvert and nvv4l2h265enc are used together - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

  • nvv4l2camerasrc: nvv4l2camerasrc: NvV4l2CameraSrc

Typical usage:

gst-launch-1.0 nvv4l2camerasrc device=/dev/video0 bufapi-version=1 ! ‘video/x-raw(memory:NVMM),width=1920,height=1080,framerate=60/1’ ! nvvideoconvert ! ‘video/x-raw(memory:NVMM),format=NV12’ ! mx.sink_0 nvv4l2camerasrc device=/dev/video1 bufapi-version=1 ! ‘video/x-raw(memory:NVMM),width=1920,height=1080,framerate=60/1’ ! nvvideoconvert ! ‘video/x-raw(memory:NVMM),format=NV12’ ! mx.sink_1 nvstreammux width=1920 height=1080 batch-size=2 live-source=1 name=mx ! nvinfer config-file-path=/opt/nvidia/deepstream/deepstream-5.0/samples/configs/deepstream-app/config_infer_primary.txt batch-size=2 ! nvvideoconvert ! nvmultistreamtiler width=1920 height=1080 rows=1 columns=2 ! nvvideoconvert ! nvdsosd ! nvegltransform ! nveglglessink sync=0

The related topic in forum:
Low camera frame rate - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

  • nvdrmvideosink: Nvidia Drm Video Sink

Typical pipeline:
gst-launch-1.0 filesrc location=/opt/nvidia/deepstream/deepstream-5.0/samples/streams/sample_1080p_h264.mp4 ! qtdemux ! h264parse ! nvv4l2decoder bufapi-version=1 ! nvvideoconvert ! m.sink_0 nvstreammux name=m batch-size=1 width=1920 height=1080 ! nvinfer config-file-path= /opt/nvidia/deepstream/deepstream-5.0/samples/configs/deepstream-app/config_infer_primary.txt ! nvdrmvideosink conn_id=0 plane_id=1 set_mode=0 -e

The related topic in forum:
Which videosink for Jetson TX2 in EGLFS? - Jetson & Embedded Systems / Jetson TX2 - NVIDIA Developer Forums

  • nv3dsink: Nvidia 3D sink

Typical pipeline:
gst-launch-1.0 filesrc location=/opt/nvidia/deepstream/deepstream-5.0/samples/streams/sample_1080p_h264.mp4 ! qtdemux ! h264parse ! nvv4l2decoder bufapi-version=1 ! nvvideoconvert ! m.sink_0 nvstreammux name=m batch-size=1 width=1920 height=1080 ! nvinfer config-file-path= /opt/nvidia/deepstream/deepstream-5.0/samples/configs/deepstream-app/config_infer_primary.txt ! nv3dsink sync=false

Note: The nv3dsink plugin is a window-based rendering sink, and based on X11.

  • nvoverlaysink: OpenMax Video Sink

Typical pipeline:

gst-launch-1.0 filesrc location=/opt/nvidia/deepstream/deepstream-5.0/samples/streams/sample_1080p_h264.mp4 ! qtdemux ! h264parse ! nvv4l2decoder bufapi-version=1 ! nvvideoconvert ! m.sink_0 nvstreammux name=m batch-size=1 width=1920 height=1080 ! nvinfer config-file-path= /opt/nvidia/deepstream/deepstream-5.0/samples/configs/deepstream-app/config_infer_primary.txt ! nvoverlaysink sync=0

Note:The nvoverlaysink plugin is deprecated in L4T release 32.1. Please use nvdrmvideosink or nv3dsink for rendering gst-v4l2 decoder output.

DeepStream Incompatible Plugins

11. [DS 5.x_All_App] How to implement a python binding

Refer following samples from forum users:

12. [DS 5.0GA_Jetson_App]: Dump NV12 NvBufSurface into a YUV file
Each NV12 NvBufSurface includes two semi-planes which are not continuous in memory.
gstnvinfer_dump_NV12_NvBufSurface.patch (4.9 KB)

This is a sample change to /opt/nvidia/deepstream/deepstream-5.1/sources/gst-plugins/gst-nvinfer/gstnvinfer.cpp to dump the NV12 NvBufSurface before transforming to RGB data.
After getting the YUV file, we can view it in https://rawpixels.net/ as below

13. [DS 5.x_All_App] How to access and modify the NvBufSurface

Refer Deepstream sample code snippet - #3 by bcao

14. [All_Jetson_App] Check memory leakage with valgrind

  1. Install valgrind with below command
    $ sudo apt-get install valgrind valgrind-dbg
  2. Run application with below command
    $ valgrind --tool=memcheck --leak-check=full --num-callers=100 --show-leak-kinds=definite,indirect --track-origins=yes ./app
  1. [DSx_All_App] Debug Tips for DeepStream Accuracy Issue
    15.1 Confirm your model has got good accuracy in training and inference outside DeepStream
    15.2 When deploying a ONNX model to DeepStream with nvinfer plugin, confirm if below nvinfer parameters are set correctly
    15.2.1 Input scale & offset
    1). net-scale-factor =
    2). offsets
    The usage of these two parameters are as below (from doc)


    15.2.2 Input Order
    1). network-input-order= // 0:NCHW 1:NHWC
    2). infer-dims= // if network-input-order=1, i.e. NHWC, infer-dims must be specified, otherwise, nvinfer can’t detect input dims automatically
    3). model-color-format= // 0: RGB 1: BGR 2: GRAY
    15.2.3 scale and padding
    1). maintain-aspect-ratio= // whether to maintain aspect ratio while scaling input
    2). symmetric-padding= // whether to pad image symmetrically while scaling input. By defaulut, it’s asymmetrical padding and the image will be scaled to top left corner.
    15.2.4 inference precision
    1). network-mode= // 0: FP32 1: INT8 2: FP16. If INT8 accuracy is not good, try FP16 or FP32
    15.2.5 threshold
    1). threshold=
    2). pre-cluster-threshold=
    3). Post-cluster-threshold=
    Above are some highlighted parameters for a quick check for accuracy. For more detailed informantion, please refer to nvinfer doc - Gst-nvinfer — DeepStream 6.0 Release documentation
    15.3 Dump the input or output of the nvinfer
    Below two items in DeepStream SDK FAQ - #9 by mchi
    2. [DS5.0GA_Jetson_dGPU_Plugin] Dump the Inference Input ==> compare the input between DS and your own standalone inference/training app
    3. [DS5_Jetson_dGPU_Plugin] Dump the Inference outputs ==> then apply your own parser offline check this output data

16. [DeepStream 6.0 GA] python binding installation

Download the wheel files directly from Releases · NVIDIA-AI-IOT/deepstream_python_apps · GitHub
Or build it referring to steps below:

16.1 dGPU+x86 platform & Triton docker

[DeepStream 6.0] Unable to install python_gst into nvcr.io/nvidia/deepstream:6.0-triton container - #5 by rpaliwal_nvidia

16.2 dGPU+x86 platform & non-Triton docker

  please refer to deepstream_python_apps/bindings at master · NVIDIA-AI-IOT/deepstream_python_apps · GitHub and steps below if you use DS6,0GA docker -
## 1.  Prerequisites
apt install -y git python-dev python3 python3-pip python3.6-dev python3.8-dev cmake g++ build-essential \
    libglib2.0-dev libglib2.0-dev-bin python-gi-dev libtool m4 autoconf automake

# 2. Gst-python
cd /opt/nvidia/deepstream/deepstream/sources/apps/
git clone https://github.com/NVIDIA-AI-IOT/deepstream_python_apps.git
cd deepstream_python_apps/
git submodule update --init
apt-get install --reinstall ca-certificates
cd 3rdparty/gst-python/
./autogen.sh
make && make install

# 3. install pyds
cd deepstream_python_apps/bindings/
mkdir build
cd build
cmake ..
make
pip3 install ./pyds-1.1.0-py3-none-linux_x86_64.whl

# 4. run sample
cd deepstream_python_apps
mv  apps/* ./
cd deepstream-test1/
python3 deepstream_test_1.py ../../../../samples/streams/sample_qHD.h264
![image|690x361](upload://yKIofGABfyeSYJKEdsr1j5OFOI2.png)

16.3 Jetson dockers

Rrefer to deepstream_python_apps/bindings at master · NVIDIA-AI-IOT/deepstream_python_apps · GitHub and steps below if you use DS6,0GA docker -

## 1.  Prerequisites
apt-get update
apt install -y git python-dev python3 python3-pip python3.6-dev python3.8-dev cmake g++ build-essential \
    libglib2.0-dev libglib2.0-dev-bin python-gi-dev libtool m4 autoconf automake

# 2. Gst-python
cd /opt/nvidia/deepstream/deepstream/sources/apps/
git clone https://github.com/NVIDIA-AI-IOT/deepstream_python_apps.git
cd deepstream_python_apps/
git submodule update --init
apt-get install --reinstall ca-certificates
cd 3rdparty/gst-python/
./autogen.sh
make && make install

# 3. install pyds
cd deepstream_python_apps/bindings/
mkdir build
cd build
cmake ..  -DPYTHON_MAJOR_VERSION=3 -DPYTHON_MINOR_VERSION=6 -DPIP_PLATFORM=linux_aarch64 -DDS_PATH=/opt/nvidia/deepstream/deepstream
make
pip3 install ./pyds-1.1.0-py3-none-linux_aarch64.whl

# 4. run sample
cd deepstream_python_apps
mv  apps/* ./
cd deepstream-test1/
python3 deepstream_test_1.py ../../../../samples/streams/sample_qHD.h264

17.[DeepStream_dGPU_App] Using OpenCV to run deepstream pipeline

Sometimes the gstreamer pipeline in opencv will fail. Please refer to the following topic to resolve this problem.

How to compile OpenCV with Gstreamer [Ubuntu&Windows] | by Galaktyk 01 | Medium

18. Open model deployment on DeepStream (Thanks for the sharing!)
Yolov5-small : Custom Yolov5 on Deepstream 6.0 (Thanks @raghavendra.ramya)
Yolo2/3/4/5/OR : Improved DeepStream for YOLO models (Thanks @marcoslucianops )
YoloV5+Triton : Triton Inference through docker - #7 by mchi
YoloV4 : GitHub - NVIDIA-AI-IOT/yolov4_deepstream + deepstream_yolov4.tgz - Google Drive
YoloV4+dspreprocess : deepstream_yolov4_with_nvdspreprocess.tgz - Google Drive
YoloV5 + nvinfer : GitHub - beyondli/Yolo_on_Jetson

19. [DSx_All_App] How to use classification model as pgie?
The input is a blue car picture, we want to get the blue label, here is the test command:
blueCar.zip (37.6 KB)
dstest_appsrc_config.txt (3.7 KB)

gst-launch-1.0 filesrc location=blueCar.jpg ! jpegdec ! videoconvert ! video/x-raw,format=I420 ! nvvideoconvert ! video/x-raw\(memory:NVMM\),format=NV12 ! mux.sink_0 nvstreammux name=mux batch-size=1 width=1280 height=720 ! nvinfer config-file-path=./dstest_appsrc_config.txt ! nvvideoconvert ! video/x-raw\(memory:NVMM\),format=RGBA ! nvdsosd ! nvvideoconvert ! video/x-raw,format=I420 ! jpegenc ! filesink location=out.jpg

[Access output of Primary Classifier]
[Resnet50 with imagenet dataset image classification using deepstream sdk]

20. How to trouble shoot error cuGraphicsGLRegisterBuffer failed with error(219) gst_eglglessink_cuda_init texture = 1

CUDA_ERROR_INVALID_GRAPHICS_CONTEXT = 219

This indicates an error with OpenGL or DirectX context.

Make sure you use nvidia X driver.
Please follow this to setup nvidia X server. Chapter 6. Configuring X for the NVIDIA Driver
These are some common problems you may meet associated with the driver. Chapter 8. Common Problems (nvidia.com)

https://forums.developer.nvidia.com/t/issue-runnung-deepstream-app-docker-container-5-0-6-0-in-rtx-3080-and-a5000-laptop/213783
cuGraphicsGLRegisterBuffer failed with error(219) gst_eglglessink_cuda_init texture = 1 - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

21.[Jetson] TRT version miss match between Deepstream 6.1 docker and device version can be fixed by APT update for Jetpack 5.0.1 DP

1 docker run --rm -it --runtime=nvidia REPOSITORY:TAG
2 remove previous TRT package
  apt-get purge --remove libnvinfer8 libnvinfer-plugin8  libnvinfer-bin python3-libnvinfer
3 apt-get update 
4 install TRT 8.4.0.11 package
  apt-get install libnvinfer8 libnvinfer-plugin8  libnvinfer-bin python3-libnvinfer 
5 Verify TRT version
  nm -D /usr/lib/aarch64-linux-gnu/libnvinfer.so.8.4.0 |grep version

related topic 218888

22. [Jetson] VIC Configuration failed image scale factor exceeds 16
this issue is a limitation of Jetson VIC processing and can be fixed by modifying configuration, for example:

# model's dimensions: height is 1168, width is 720.
uff-input-dims=3;1168;720;0  
#if scaling-compute-hw = VIC, input-object-min-height need to be even and greater than or equal to (model height)/16  
input-object-min-height=74
#if scaling-compute-hw = VIC, input-object-min-width need to be even and greater than or equal to( model width)/16  
input-object-min-width=46

related topic [VIC Configuration failed image scale factor exceeds 16, use GPU for Transformation - #3 by Amycao]