Why nvv4l2decoder use too much cpu?

Hi,

when I use two tesla P4 gpu to decode 50 rtsp streams. In first 2~3 hours it seems ok but after that, the nvv4l2decoder would occupy too much cpu. the picture in attachment.

and I have set every 2 frames to decode one frame, so every gpu can decode 25 sources.

if (g_strstr_len (name, -1, "nvv4l2decoder") == name) {
        //g_object_set (object, "skip-frames", 2, NULL);
        g_object_set (object, "gpu-id", obj->gpu_id, NULL);
        g_object_set(object, "drop-frame-interval", 2, NULL);
        g_object_set(object, "cudadec-memtype", 0, NULL);
    }

Hi,

for the question from Viranjan who writes the plugins.

  1. Is this also seen while decoding file based streams?

A: I have not tried the file based stream yet.

  1. What is the bitrate of the input streams? CPU is only used for parsers.

A: the bitrate is about 4096 Kbps in general.

  1. Is there any software plugin you are using in your application?

A: I have write a sink plugins myself. But I also try fakesink, it no use with it. And my pipeline is:

uridecodebin
uridecodebin
.
.(25 sources) -> nvstreammux -> nvinfer -> nvtrack -> fakesink.
.
uridecodebin
uridecodebin

  1. Could you please share you config file?

pipeline code:

void NVGstPipeline::
decodebin_child_added (GstChildProxy * child_proxy, GObject * object,
                       gchar * name, gpointer user_data)
{
    NVGstPipeline *obj = (NVGstPipeline *)user_data;
    if (g_strrstr (name, "decodebin") == name) {
        g_signal_connect (G_OBJECT (object), "child-added",
                          G_CALLBACK (decodebin_child_added), user_data);
    }

    if(g_strcmp0(name, "source") == 0) {
        g_object_set(G_OBJECT (object), "message-forward", true, nullptr);
        g_object_set(G_OBJECT (object), "protocols", 0x4, nullptr);
        g_object_set(G_OBJECT (object), "timeout", 50000000, nullptr);
        //g_object_set(G_OBJECT (object), "drop-on-latency", TRUE, nullptr);
    }

    if (g_strstr_len (name, -1, "nvv4l2decoder") == name) {
        //g_object_set (object, "skip-frames", 2, NULL);
        g_object_set (object, "gpu-id", obj->gpu_id, NULL);
        g_object_set(object, "drop-frame-interval", 2, NULL);
        g_object_set(object, "cudadec-memtype", 0, NULL);
        g_object_set(object, "num-extra-surfaces", 5, NULL);
    }
}
GstElement *pipeline = NULL;
    GstElement *streammux = NULL;
    GstElement *pgie = NULL;
    GstElement *nvtracker = NULL;
    GstElement *queue0 = NULL;
    GstElement *queue1 = NULL;
    GstElement *queue2 = NULL;
    GstElement *sink = NULL;

    do
    {
        std::string name = "pipeline-" + std::to_string(gpu_id);
        pipeline = gst_pipeline_new(name.c_str());
        if (pipeline == NULL) {
            LOG(ERROR) << "Failed to create pipeline.";
            break;
        }

        streammux = gst_element_factory_make("nvstreammux", STREAM_MUXER_NAME);
 
        pgie = gst_element_factory_make ("nvinfer", "primary-nvinference-engine");

        nvtracker = gst_element_factory_make ("nvtracker", "tracker");

        queue0 = gst_element_factory_make ("queue", "queue0");
        queue1 = gst_element_factory_make ("queue", "queue1");
        queue2 = gst_element_factory_make ("queue", "queue2");

        sink = gst_element_factory_make("fakesink", "savebroker");


        if (!streammux || !pgie || !nvtracker || !sink) {
            LOG(ERROR) << "Failed to create elements during building pipeline.";
            break;
        }


        g_object_set(G_OBJECT(streammux),
                     "width", image_width,
                     "height", image_height,
                     "batch-size", MAX_NUM_SRCS + 1,
                     "live-source", true,
                     "enable-padding", true,
                     "batched-push-timeout", 40000,
                     "nvbuf-memory-type", 0,
                     "gpu-id", gpu_id, NULL);

        g_object_set (G_OBJECT (pgie),
                      "config-file-path", config_file_path.c_str(),
                      "process-mode", 1,
                      "batch-size", MAX_NUM_SRCS + 1,
                      "gpu-id", gpu_id,
                      "raw-output-generated-callback", G_CALLBACK(gst_nvinfer_output_generated_callback),
                      NULL);

        g_object_set (G_OBJECT (nvtracker),
                      "ll-lib-file", ll_lib_file.c_str(),
                      "ll-config-file", ll_config_file.c_str(),
                      "gpu-id", gpu_id,
                      "tracker-width", tracker_width,
                      "tracker-height", tracker_height,
                      "enable-batch-process", TRUE,
                      NULL);


        //g_object_set (G_OBJECT(queue0), "leaky", 2, NULL);
        //g_object_set (G_OBJECT(queue1), "leaky", 2, NULL);
        //g_object_set (G_OBJECT(queue2), "leaky", 2, NULL);

        g_object_set (G_OBJECT (sink),
                      "sync", false,
                      NULL);

nvinfer config file:

################################################################################
# Copyright (c) 2018-2019, NVIDIA CORPORATION. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
################################################################################

# Following properties are mandatory when engine files are not specified:
#   int8-calib-file(Only in INT8), model-file-format
#   Caffemodel mandatory properties: model-file, proto-file, output-blob-names
#   UFF: uff-file, input-dims, uff-input-blob-name, output-blob-names
#   ONNX: onnx-file
#
# Mandatory properties for detectors:
#   num-detected-classes,
#   custom-lib-path,
#   parse-bbox-func-name
#
# Optional properties for detectors:
#   enable-dbscan(Default=false), interval(Primary mode only, Default=0)
#
# Mandatory properties for classifiers:
#   classifier-threshold, is-classifier
#
# Optional properties for classifiers:
#   classifier-async-mode(Secondary mode only, Default=false)
#
# Optional properties in secondary mode:
#   operate-on-gie-id(Default=0), operate-on-class-ids(Defaults to all classes),
#   input-object-min-width, input-object-min-height, input-object-max-width,
#   input-object-max-height
#
# Following properties are always recommended:
#   batch-size(Default=1)
#
# Other optional properties:
#   net-scale-factor(Default=1), network-mode(Default=0 i.e FP32),
#   model-color-format(Default=0 i.e. RGB) model-engine-file, labelfile-path,
#   mean-file, gie-unique-id(Default=0), offsets, gie-mode (Default=1 i.e. primary),
#   custom-lib-path, network-mode(Default=0 i.e FP32)
#
# The values in the config file are overridden by values set through GObject
# properties.

[property]
#gpu-id=0
net-scale-factor=1.0
offsets=0;0;0
#0=RGB, 1=BGR
model-color-format=0
model-engine-file=model/mnet-deconv-0517.caffemodel_b26_fp32.engine
model-file=model/mnet-deconv-0517.caffemodel
proto-file=model/mnet-deconv-0517.prototxt
#int8-calib-file=model/mnet-deconv-0517.table.int8
#batch-size=2
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=0
## 0=NvDsInferNetworkType_Detector, 1=NvDsInferNetworkType_Classifier, 2=NvDsInferNetworkType_Segmentation, 100=NvDsInferNetworkType_Other
network-type=100
num-detected-classes=1
interval=0
gie-unique-id=1
output-blob-names=face_rpn_cls_prob_reshape_stride32;face_rpn_bbox_pred_stride32;face_rpn_landmark_pred_stride32;face_rpn_cls_prob_reshape_stride16;face_rpn_bbox_pred_stride16;face_rpn_landmark_pred_stride16;face_rpn_cls_prob_reshape_stride8;face_rpn_bbox_pred_stride8;face_rpn_landmark_pred_stride8

[class-attrs-all]
#threshold=0.2
#eps=0.1
#group-threshold=2
#roi-top-offset=0
#roi-bottom-offset=0
#detected-min-w=0
#detected-min-h=0
#detected-max-w=0
#detected-max-h=0

## Per class configuration
# Prevent background detection
#[class-attrs-0]
#threshold=1.1

tracking config file :

%YAML:1.0
  
NvDCF:
  useBufferedOutput: 0

  maxTargetsPerStream: 30 # Max number of targets to track per stream. Recommended to set >10. Note: this value should account for the targets being tracked in shadow mode as well. Max value depends on the GPU memory capacity

  filterLr: 0.11 # learning rate for DCF filter in exponential moving average. Valid Range: [0.0, 1.0]
  gaussianSigma: 0.75 # Standard deviation for Gaussian for desired response when creating DCF filter

  minDetectorConfidence: 0.0 # If the confidence of a detector bbox is lower than this, then it won't be considered for tracking
  minTrackerConfidence: 0.2 # If the confidence of an object tracker is lower than this on the fly, then it will be tracked in shadow mode. Valid Range: [0.0, 1.0]

  featureImgSizeLevel: 1 # Size of a feature image. Valid range: {1, 2, 3, 4, 5}, from the smallest to the largest
  SearchRegionPaddingScale: 3 # Search region size. Determines how large the search region should be scaled from the target bbox.  Valid range: {1, 2, 3}, from the smallest to the largest

  maxShadowTrackingAge: 9        # Max length of shadow tracking (the shadow tracking age is incremented when (1) there's detector input yet no match or (2) tracker confidence is lower than minTrackerConfidence). Once reached, the tracker will be terminated.
  probationAge: 12                # Once the tracker age (incremented at every frame) reaches this, the tracker is considered to be valid
  earlyTerminationAge: 2         # Early termination age (in terms of shadow tracking age) during the probation period

  minVisibiilty4Tracking: 0.1    # If the visibility of the bbox of a tracker gets lower, then it will be terminated

Hi,

when I just connect 10 sources per GPU. it also happen.

I doubt that the pipeline have bug ?

uridecodebin -> nvstreammux -> nvinfer -> nvtracker -> fakesink

I have seen the deepstream demo, the pipeline is :

rtspsrc -> depay -> queue -> decodebin -> queue -> nvstreammux -> …

does the plugin “queue” make sense ?

if docker would affect performance?

Thanks.

Hi,

I write a demo in the attachment.

when i set the attribute:

g_object_set(object, "drop-frame-interval", 2, NULL);

the cpu would go up to 60% after about 4 hours. and 80% after 5 hours. and it still increase .

use command: top -H -p

6134 yzy       20   0 19.177g 510132 301836 S 66.7  1.6  89:43.52 nvv4l2decoder0:                                                                              
 6123 yzy       20   0 19.177g 510132 301836 S 28.0  1.6  40:32.52 rtpjitterbuffer                                                                              
 6133 yzy       20   0 19.177g 510132 301836 S  1.3  1.6   5:41.45 stream-muxer:sr

when i don’t set the drop-frame-interval attribute, it seems noraml.

Thanks.
test_base.cpp (10.5 KB)

Hi,

why no one to reply my question?

Thanks.

Hi,
It is similar to https://devtalk.nvidia.com/default/topic/1065372
And under investigation.

Hi,

I have had the same problem for several days. I set “drop-frame-interval” too, and the cpu increase over time.

Has this issue been solved?

Hi alpaserss,
The debugging is undergoing. Will update once there is further findings. Thanks.

Hi,
Please check
https://devtalk.nvidia.com/default/topic/1065372/deepstream-sdk/jetson-nano-shows-100-cpu-usage-after-30-minutes-with-deepstream-app-demo/post/5409211/#5409211