Debug a customized classification model with TLT 2.0 + Deepstream 5.0

• Hardware Platform Jetson TX-2
• DeepStream Version: 5.0
• TLT (TAO) Version: 2.0
• JetPack Version 4.4

Hello,
Thanks a lot for reading this post. It’s going to be a bit long.

In order to get familiar with the Nvidia end-to-end solution, we

  1. Collected a small data set of traffic light color (green, yellow and red).
  2. Trained an image classification model with TLT 2.0.
  3. Deploy on Deepstream 5.0 using a modified version of the sample app: apps/sample_apps/deepstream-test2.

Unfortunately the classifier outputs wrong or no result on a testing video with a traffic light.
Ground-truth → Result

  • Green → Green
  • Yellow → Green
  • Red → No result

We get stuck here and appreciate any input on where we should look for the issue.

Here are more details about the training and deployment.

Training

Use resnet-10 and NO pre-trained model, we trained a 3-class image classifier. Also evaluated and infer on invididual frames, the resuls are 100% good. The train config file is below: (will share dataset upon request)

model_config {
  arch: "resnet"

  # for resnet --> n_layers can be [10, 18, 34, 50, 101]
  n_layers: 10
  use_bias: False
  use_batch_norm: True
  all_projections: False
  use_pooling: True
  freeze_bn: False
  freeze_blocks: 0
  freeze_blocks: 1

  # image size should be "3, X, Y", where X,Y >= 16
  input_image_size: "3,28,62"
}

train_config {
  train_dataset_path: "/workspace/dataset/dataset_vlpd/train"
  val_dataset_path: "/workspace/dataset/dataset_vlpd/val"
  # optimizer can be chosen from ['adam', 'sgd']
  optimizer: "sgd"
  batch_size_per_gpu: 32
  n_epochs: 10
  n_workers: 2
  # regularizer
  reg_config {
    type: "L2"
    scope: "Conv2D,Dense"
    weight_decay: 0.00005
  }

  # learning_rate
  lr_config {
    # "step" and "soft_anneal" are supported.
    scheduler: "soft_anneal"

    # "soft_anneal" stands for soft annealing learning rate scheduler.
    # the following 4 parameters should be specified if "soft_anneal" is used.
    learning_rate: 0.005
    soft_start: 0.056
    annealing_points: "0.3, 0.6, 0.8"
    annealing_divider: 10
    # "step" stands for step learning rate scheduler.
    # the following 3 parameters should be specified if "step" is used.
    # learning_rate: 0.006
    # step_size: 10
    # gamma: 0.1

    # "cosine" stands for soft start cosine learning rate scheduler.
    # the following 2 parameters should be specified if "cosine" is used.
    # learning_rate: 0.05
    # soft_start: 0.01
  }
}

Deployment

We modified the deepstream-test2 to remove the tracker and keep only one classsifier.
filesrc→ decode→ nvstreammux→ nvinfer (dummy detector)→ nvinfer (customer classifier)→ nvdsosd

  1. The primary detector is the default one and used as an dummy detctor.
  2. Right after the primary detector, add a new detection result (a fixed ROI) for the traffic light in attach_metadata_full_frame().
  3. The secondary customized classifier classifies the ROIs and output the color.

Like mentioned above, the classifier outputs wrong for yellow and nothing for red.
How to enforce the classifier outputs all the time? Withclassifier-threshold=0, the classifier is expected to output something (even wrong)

#include <gst/gst.h>
#include <glib.h>

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "gstnvdsmeta.h"

#define PGIE_CONFIG_FILE  "configs/dstest2_pgie_config.txt"
#define SGIE1_CONFIG_FILE "configs/sgie1.txt"
#define MAX_DISPLAY_LEN 64

#define PGIE_CLASS_ID_VEHICLE  0
#define PGIE_CLASS_ID_BYCICLE  1
#define PGIE_CLASS_ID_PERSON   2
#define PGIE_CLASS_ID_ROADSIGN 3

/* The muxer output resolution must be set if the input streams will be of
 * different resolution. The muxer will scale all the input frames to this
 * resolution. */
#define MUXER_OUTPUT_WIDTH 1920
#define MUXER_OUTPUT_HEIGHT 1080

/* Muxer batch formation timeout, for e.g. 40 millisec. Should ideally be set
 * based on the fastest source's framerate. */
#define MUXER_BATCH_TIMEOUT_USEC 40000

gint frame_number = 0;

static void
attach_metadata_full_frame (NvDsFrameMeta *frame_meta)
{
    gdouble scale_ratio = 1.0;
    NvDsBatchMeta *batch_meta = frame_meta->base_meta.batch_meta;
    NvDsObjectMeta *object_meta = NULL;
    static gchar font_name[] = "Serif";
    // GST_DEBUG_OBJECT (dsexample, "Attaching metadata %d\n", output->numObjects);

    //for (gint i = 0; i < output->numObjects; i++) 
    for (gint i = 0; i < 1; i++)
    {
        //DsExampleObject *obj = &output->object[i];
        object_meta = nvds_acquire_obj_meta_from_pool(batch_meta);
        NvOSD_RectParams *rect_params = &(object_meta->rect_params);
        NvOSD_TextParams *text_params = &(object_meta->text_params);

        /* Assign bounding box coordinates */
        rect_params->left    = 1224; //obj->left;
        rect_params->top     = 325;  //obj->top;
        rect_params->width   = 34;   //28; //obj->width;
        rect_params->height  = 66;   //58; //obj->height;

        if (i == 1)
        {
          rect_params->left    = 1410; //obj->left;
          rect_params->top     = 340;  //obj->top;
          rect_params->width   = 34;   //28; //obj->width;
          rect_params->height  = 66;   //58; //obj->height;
        }

        object_meta->confidence  = 1;
        object_meta->detector_bbox_info.org_bbox_coords.left   = rect_params->left;
        object_meta->detector_bbox_info.org_bbox_coords.top    = rect_params->top;
        object_meta->detector_bbox_info.org_bbox_coords.width  = rect_params->width;
        object_meta->detector_bbox_info.org_bbox_coords.height = rect_params->height;

        /* Semi-transparent yellow background */
        //rect_params->has_bg_color = 0;
        //rect_params->bg_color = (NvOSD_ColorParams) {1, 1, 0, 0.4};
        /* Red border of width 6 */
        rect_params->border_width = 3;
        rect_params->border_color = (NvOSD_ColorParams) {1, 0, 0, 1};

        /* Scale the bounding boxes proportionally based on how the object/frame was
         * scaled during input */
        rect_params->left    /= scale_ratio;
        rect_params->top     /= scale_ratio;
        rect_params->width   /= scale_ratio;
        rect_params->height  /= scale_ratio;
        // GST_DEBUG_OBJECT (dsexample, "Attaching rect%d of batch%u"
        //     "  left->%f top->%f width->%f"
        //     " height->%f label->%s\n", i, batch_id, rect_params.left,
        //     rect_params.top, rect_params.width, rect_params.height, obj->label);

        // Traffic light
        object_meta->class_id  = PGIE_CLASS_ID_VEHICLE; // BYCICLE;
        object_meta->object_id = 1; //UNTRACKED_OBJECT_ID;
        //g_strlcpy (object_meta->obj_label, obj->label, MAX_LABEL_SIZE);

        /* display_text required heap allocated memory */
        text_params->display_text = g_strdup ("Light");
        /* Display text above the left top corner of the object */
        text_params->x_offset = rect_params->left;
        text_params->y_offset = rect_params->top - 10;
        /* Set black background for the text */
        text_params->set_bg_clr = 1;
        text_params->text_bg_clr = (NvOSD_ColorParams) {0, 0, 0, 1};
        /* Font face, size and color */
        text_params->font_params.font_name = font_name;
        text_params->font_params.font_size = 11;
        text_params->font_params.font_color = (NvOSD_ColorParams) {1, 1, 1, 1};

        nvds_add_obj_meta_to_frame(frame_meta, object_meta, NULL);
        frame_meta->bInferDone = TRUE;
    }
}

static GstPadProbeReturn
pgie_src_pad_buffer_probe (GstPad * pad, GstPadProbeInfo * info,
    gpointer u_data)
{
    GstBuffer *buf = (GstBuffer *) info->data;
    guint num_rects = 0; 
    NvDsObjectMeta *obj_meta = NULL;
    guint vehicle_count = 0;
    guint person_count = 0;
    guint roadsign_count = 0;
    guint bicycle_count = 0;
    NvDsMetaList * l_frame = NULL;
    NvDsMetaList * l_obj = NULL;

    NvDsBatchMeta *batch_meta = gst_buffer_get_nvds_batch_meta (buf);

    for (l_frame = batch_meta->frame_meta_list; l_frame != NULL; l_frame = l_frame->next) 
    {
        NvDsFrameMeta *frame_meta = (NvDsFrameMeta *) (l_frame->data);

        attach_metadata_full_frame(frame_meta);
        for (l_obj = frame_meta->obj_meta_list; l_obj != NULL; l_obj = l_obj->next) 
        {
            obj_meta = (NvDsObjectMeta *) (l_obj->data);            
            if (obj_meta->class_id == PGIE_CLASS_ID_BYCICLE)
            {
                bicycle_count++;
                num_rects++;
            }            
            if (obj_meta->class_id == PGIE_CLASS_ID_ROADSIGN)
            {
                roadsign_count++;
                num_rects++;
            }
            if (obj_meta->class_id == PGIE_CLASS_ID_VEHICLE) 
            {
                vehicle_count++;
                num_rects++;
            }

            if (obj_meta->class_id == PGIE_CLASS_ID_PERSON) 
            {
                person_count++;
                num_rects++;
            }
        }
        g_print ("Frame-%d Number of objects = %d "
                "Vehicle = %d Person = %d sign = %d bicycle = %d \n",
                frame_meta->frame_num, num_rects, vehicle_count, person_count, roadsign_count, bicycle_count);
    }
    return GST_PAD_PROBE_OK;
}

/* This is the buffer probe function that we have registered on the sink pad
 * of the OSD element. All the infer elements in the pipeline shall attach
 * their metadata to the GstBuffer, here we will iterate & process the metadata
 * forex: class ids to strings, counting of class_id objects etc. */
static GstPadProbeReturn
osd_sink_pad_buffer_probe (GstPad * pad, GstPadProbeInfo * info,
    gpointer u_data)
{
    GstBuffer *buf = (GstBuffer *) info->data;
    guint num_rects = 0;
    NvDsObjectMeta *obj_meta = NULL;
    guint vehicle_count = 0;
    guint person_count = 0;
    guint bicyle_count = 0;
    NvDsMetaList * l_frame = NULL;
    NvDsMetaList * l_obj = NULL;
    NvDsDisplayMeta *display_meta = NULL;

    NvDsClassifierMetaList *l_classifier = NULL;
    NvDsClassifierMeta *class_meta = NULL;
    NvDsLabelInfoList *l_label = NULL;
    NvDsLabelInfo *label_info = NULL;

    NvDsBatchMeta *batch_meta = gst_buffer_get_nvds_batch_meta (buf);

    for (l_frame = batch_meta->frame_meta_list; l_frame != NULL; l_frame = l_frame->next) 
    {
        NvDsFrameMeta *frame_meta = (NvDsFrameMeta *) (l_frame->data);
        int offset = 0;
        for (l_obj = frame_meta->obj_meta_list; l_obj != NULL; l_obj = l_obj->next) 
		    {
            obj_meta = (NvDsObjectMeta *) (l_obj->data);
            if (obj_meta->class_id == PGIE_CLASS_ID_VEHICLE) 
            {
                bicyle_count++;
                num_rects++;

                int id = obj_meta->object_id;
                for(l_classifier = obj_meta->classifier_meta_list; l_classifier != NULL;
                    l_classifier = l_classifier->next) 
                {
                      class_meta = (NvDsClassifierMeta *) (l_classifier->data);
                      for(l_label = class_meta->label_info_list; l_label != NULL;
                          l_label = l_label->next) 
                      {
                        label_info = (NvDsLabelInfo *) (l_label->data);
                        g_print("id-%d", id);
                        g_print(":");
                        g_print ("%d, %s\n", label_info->num_classes, label_info->result_label);
                      }
                }
            }
        }
    }

    // g_print ("Frame Number = %d Number of objects = %d "
    //         "Vehicle Count = %d Person Count = %d\n",
    //         frame_number, num_rects, vehicle_count, person_count);
    frame_number++;
    return GST_PAD_PROBE_OK;
}

static gboolean
bus_call (GstBus * bus, GstMessage * msg, gpointer data)
{
  GMainLoop *loop = (GMainLoop *) data;
  switch (GST_MESSAGE_TYPE (msg)) {
    case GST_MESSAGE_EOS:
      g_print ("End of stream\n");
      g_main_loop_quit (loop);
      break;
    case GST_MESSAGE_ERROR:{
      gchar *debug;
      GError *error;
      gst_message_parse_error (msg, &error, &debug);
      g_printerr ("ERROR from element %s: %s\n",
          GST_OBJECT_NAME (msg->src), error->message);
      if (debug)
        g_printerr ("Error details: %s\n", debug);
      g_free (debug);
      g_error_free (error);
      g_main_loop_quit (loop);
      break;
    }
    default:
      break;
  }
  return TRUE;
}


int
main (int argc, char *argv[])
{
  GMainLoop *loop = NULL;
  GstElement *pipeline = NULL, *source = NULL, *h264parser = NULL,
      *decoder = NULL, *streammux = NULL, *sink = NULL, *pgie = NULL, *nvvidconv = NULL,*nvvidconv2 = NULL,
      *nvosd = NULL, *sgie1 = NULL, *sgie2 = NULL, *sgie3 = NULL, *nvtracker = NULL;
  g_print ("With tracker\n");
#ifdef PLATFORM_TEGRA
  GstElement *transform = NULL;
    g_print ("On Tegra platform\n");
#endif
  GstBus *bus = NULL;
  guint bus_watch_id = 0;
  GstPad *osd_sink_pad = NULL;

  /* Check input arguments */
  if (argc != 2) {
    g_printerr ("Usage: %s <elementary H264 filename>\n", argv[0]);
    return -1;
  }

  /* Standard GStreamer initialization */
  gst_init (&argc, &argv);
  loop = g_main_loop_new (NULL, FALSE);

  /* Create gstreamer elements */
  /* Create Pipeline element that will be a container of other elements */
  pipeline = gst_pipeline_new ("dstest2-pipeline");

  /* Source element for reading from the file */
  source = gst_element_factory_make ("filesrc", "file-source");

  /* Since the data format in the input file is elementary h264 stream,
   * we need a h264parser */
  h264parser = gst_element_factory_make ("h264parse", "h264-parser");

  /* Use nvdec_h264 for hardware accelerated decode on GPU */
  decoder = gst_element_factory_make ("nvv4l2decoder", "nvv4l2-decoder");

  /* Create nvstreammux instance to form batches from one or more sources. */
  streammux = gst_element_factory_make ("nvstreammux", "stream-muxer");

  /* Use nvinfer to run inferencing on decoder's output,
   * behaviour of inferencing is set through config file */
  pgie = gst_element_factory_make ("nvinfer", "primary-nvinference-engine");

  /* We need three secondary gies so lets create 3 more instances of
     nvinfer */
  sgie1 = gst_element_factory_make ("nvinfer", "secondary1-nvinference-engine");

  /* Use convertor to convert from NV12 to RGBA as required by nvosd */
  nvvidconv  = gst_element_factory_make ("nvvideoconvert", "nvvideo-converter");
  nvvidconv2 = gst_element_factory_make ("nvvideoconvert", "nvvideo-converter2");

  /* Create OSD to draw on the converted RGBA buffer */
  nvosd = gst_element_factory_make ("nvdsosd", "nv-onscreendisplay");

  /* Finally render the osd output */
#ifdef PLATFORM_TEGRA
  transform = gst_element_factory_make ("nvegltransform", "nvegl-transform");
#endif
  sink = gst_element_factory_make ("nveglglessink", "nvvideo-renderer");

  // if (!source || !h264parser || !decoder || !pgie ||
  //     !nvtracker || !sgie1 || !sgie2 || !sgie3 || !nvvidconv || !nvosd || !sink) {
  if (!source || !h264parser || !decoder || !pgie ||
      !sgie1 || !nvvidconv || !nvosd || !sink) {    
    g_printerr ("One element could not be created. Exiting.\n");
    return -1;
  }

#ifdef PLATFORM_TEGRA
  if(!transform) {
    g_printerr ("One tegra element could not be created. Exiting.\n");
    return -1;
  }
#endif

  /* Set the input filename to the source element */
  g_object_set (G_OBJECT (source), "location", argv[1], NULL);
  g_object_set (G_OBJECT (streammux), "batch-size", 1, NULL);
  g_object_set (G_OBJECT (streammux), "width", MUXER_OUTPUT_WIDTH, "height",
      MUXER_OUTPUT_HEIGHT,
      "batched-push-timeout", MUXER_BATCH_TIMEOUT_USEC, NULL);

  /* Set all the necessary properties of the nvinfer element,
   * the necessary ones are : */
  g_object_set (G_OBJECT (pgie),  "config-file-path", PGIE_CONFIG_FILE,  NULL);
  g_object_set (G_OBJECT (sgie1), "config-file-path", SGIE1_CONFIG_FILE, NULL);

  /* we add a message handler */
  bus = gst_pipeline_get_bus (GST_PIPELINE (pipeline));
  bus_watch_id = gst_bus_add_watch (bus, bus_call, loop);
  gst_object_unref (bus);

  /* Set up the pipeline */
  /* we add all elements into the pipeline */
  /* decoder | pgie1 | sgie1 | etc.. */
#ifdef PLATFORM_TEGRA
  gst_bin_add_many (GST_BIN (pipeline),
      source, h264parser, decoder, streammux, pgie, sgie1, //sgie2, sgie3,
      nvvidconv, nvosd, transform, sink, NULL);
#else
  gst_bin_add_many (GST_BIN (pipeline),
      source, h264parser, decoder, streammux, pgie, sgie1, //nvvidconv2, //sgie2, sgie3,
      nvvidconv, nvosd, sink, NULL);
#endif

  GstPad *sinkpad, *srcpad;
  gchar pad_name_sink[16] = "sink_0";
  gchar pad_name_src[16] = "src";

  sinkpad = gst_element_get_request_pad (streammux, pad_name_sink);
  if (!sinkpad) {
    g_printerr ("Streammux request sink pad failed. Exiting.\n");
    return -1;
  }

  srcpad = gst_element_get_static_pad (decoder, pad_name_src);
  if (!srcpad) {
    g_printerr ("Decoder request src pad failed. Exiting.\n");
    return -1;
  }

  if (gst_pad_link (srcpad, sinkpad) != GST_PAD_LINK_OK) {
      g_printerr ("Failed to link decoder to stream muxer. Exiting.\n");
      return -1;
  }

  gst_object_unref (sinkpad);
  gst_object_unref (srcpad);

  /* Link the elements together */
  if (!gst_element_link_many (source, h264parser, decoder, NULL)) {
    g_printerr ("Elements could not be linked: 1. Exiting.\n");
    return -1;
  }

// We may need a converter between pgie and sgie1
#ifdef PLATFORM_TEGRA
  if (!gst_element_link_many (streammux, pgie, sgie1,
      nvvidconv, nvosd, transform, sink, NULL)) {
    g_printerr ("Elements could not be linked. Exiting.\n");
    return -1;
  }
#else
  if (!gst_element_link_many (streammux, pgie, sgie1,
      nvvidconv, nvosd, sink, NULL)) {
    g_printerr ("Elements could not be linked. Exiting.\n");
    return -1;
  }
#endif

  /* */
  GstPad *pgie_src_pad = NULL;
  pgie_src_pad = gst_element_get_static_pad (pgie, "src");
  if (!pgie_src_pad)
    g_print ("Unable to get sink pad\n");
  else
    gst_pad_add_probe (pgie_src_pad, GST_PAD_PROBE_TYPE_BUFFER,
        pgie_src_pad_buffer_probe, NULL, NULL);
  gst_object_unref (pgie_src_pad);

  /* Lets add probe to get informed of the meta data generated, we add probe to
   * the sink pad of the osd element, since by that time, the buffer would have
   * had got all the metadata. */
  osd_sink_pad = gst_element_get_static_pad (nvosd, "sink");
  if (!osd_sink_pad)
    g_print ("Unable to get sink pad\n");
  else
    gst_pad_add_probe (osd_sink_pad, GST_PAD_PROBE_TYPE_BUFFER,
        osd_sink_pad_buffer_probe, NULL, NULL);
  gst_object_unref (osd_sink_pad);

  /* Set the pipeline to "playing" state */
  g_print ("Now playing: %s\n", argv[1]);
  gst_element_set_state (pipeline, GST_STATE_PLAYING);

  /* Iterate */
  g_print ("Running...\n");
  g_main_loop_run (loop);

  /* Out of the main loop, clean up nicely */
  g_print ("Returned, stopping playback\n");
  gst_element_set_state (pipeline, GST_STATE_NULL);
  g_print ("Deleting pipeline\n");
  gst_object_unref (GST_OBJECT (pipeline));
  g_source_remove (bus_watch_id);
  g_main_loop_unref (loop);
  return 0;
}

Color classifier config: sgie1.txt

[property]
gpu-id=0
net-scale-factor=1.0
offsets=103.939;116.779;123.68
# 0:RGB; 1:BGR; 2:GRAY
model-color-format=0
labelfile-path=../models/test_labels.txt
#############################
# model file format
# option 1: etlt
# option 2: TensorRT engine (tlt-convert)
#############################
#tlt-encoded-model=../models/test.etlt
model-engine-file=../models/test.etlt_b1_gpu0_fp16.engine
#############################
tlt-model-key= [To be filled]
#############################
infer-dims=3;28;62
uff-input-blob-name=input_1
batch-size=1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
interval=0
gie-unique-id=2
#0=Detection 1=Classification 2=Segmentation
network-type=1
scaling-filter=1
scaling-compute-hw=1
output-blob-names=predictions/Softmax
classifier-async-mode=0
classifier-threshold=0
#1: primary 2: secondary
process-mode=2
secondary-reinfer-interval=0
maintain-aspect-ratio=0
################################333
#is-classifier=1
#input-object-max-width=200
#input-object-max-height=200
#operate-on-gie-id=1
#operate-on-class-ids=0;1;2;3
#filter-out-class-ids=3

Hi @MichaelLLL ,
Could you take a look if DeepStream SDK FAQ - #21 by mchi can help you debug this?

@mchi Thanks a lot. We will check it out.

Thanks again. @mchi
We followed the FAQ to play with all the configuration options. still no luck yet.

How about this specific question?
How to enforce the classifier to output all the time?

With classifier-threshold=0, the classifier is expected to output something always but it is not.

Have you confirmed that the model can do detection well outside of DeepStream?

Can you dump the nvinfer output and check the output can classify RED?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.