Why nvv4l2decoder use too much cpu?

ClancyLian · October 21, 2019, 8:52am

Hi,

when I use two tesla P4 gpu to decode 50 rtsp streams. In first 2~3 hours it seems ok but after that, the nvv4l2decoder would occupy too much cpu. the picture in attachment.

and I have set every 2 frames to decode one frame, so every gpu can decode 25 sources.

if (g_strstr_len (name, -1, "nvv4l2decoder") == name) {
        //g_object_set (object, "skip-frames", 2, NULL);
        g_object_set (object, "gpu-id", obj->gpu_id, NULL);
        g_object_set(object, "drop-frame-interval", 2, NULL);
        g_object_set(object, "cudadec-memtype", 0, NULL);
    }

ClancyLian · October 24, 2019, 2:49am

Hi,

for the question from Viranjan who writes the plugins.

Is this also seen while decoding file based streams?

A: I have not tried the file based stream yet.

What is the bitrate of the input streams? CPU is only used for parsers.

A: the bitrate is about 4096 Kbps in general.

Is there any software plugin you are using in your application?

A: I have write a sink plugins myself. But I also try fakesink, it no use with it. And my pipeline is:

uridecodebin
uridecodebin
.
.(25 sources) → nvstreammux → nvinfer → nvtrack → fakesink.
.
uridecodebin
uridecodebin

Could you please share you config file?

pipeline code:

void NVGstPipeline::
decodebin_child_added (GstChildProxy * child_proxy, GObject * object,
                       gchar * name, gpointer user_data)
{
    NVGstPipeline *obj = (NVGstPipeline *)user_data;
    if (g_strrstr (name, "decodebin") == name) {
        g_signal_connect (G_OBJECT (object), "child-added",
                          G_CALLBACK (decodebin_child_added), user_data);
    }

    if(g_strcmp0(name, "source") == 0) {
        g_object_set(G_OBJECT (object), "message-forward", true, nullptr);
        g_object_set(G_OBJECT (object), "protocols", 0x4, nullptr);
        g_object_set(G_OBJECT (object), "timeout", 50000000, nullptr);
        //g_object_set(G_OBJECT (object), "drop-on-latency", TRUE, nullptr);
    }

    if (g_strstr_len (name, -1, "nvv4l2decoder") == name) {
        //g_object_set (object, "skip-frames", 2, NULL);
        g_object_set (object, "gpu-id", obj->gpu_id, NULL);
        g_object_set(object, "drop-frame-interval", 2, NULL);
        g_object_set(object, "cudadec-memtype", 0, NULL);
        g_object_set(object, "num-extra-surfaces", 5, NULL);
    }
}

GstElement *pipeline = NULL;
    GstElement *streammux = NULL;
    GstElement *pgie = NULL;
    GstElement *nvtracker = NULL;
    GstElement *queue0 = NULL;
    GstElement *queue1 = NULL;
    GstElement *queue2 = NULL;
    GstElement *sink = NULL;

    do
    {
        std::string name = "pipeline-" + std::to_string(gpu_id);
        pipeline = gst_pipeline_new(name.c_str());
        if (pipeline == NULL) {
            LOG(ERROR) << "Failed to create pipeline.";
            break;
        }

        streammux = gst_element_factory_make("nvstreammux", STREAM_MUXER_NAME);
 
        pgie = gst_element_factory_make ("nvinfer", "primary-nvinference-engine");

        nvtracker = gst_element_factory_make ("nvtracker", "tracker");

        queue0 = gst_element_factory_make ("queue", "queue0");
        queue1 = gst_element_factory_make ("queue", "queue1");
        queue2 = gst_element_factory_make ("queue", "queue2");

        sink = gst_element_factory_make("fakesink", "savebroker");


        if (!streammux || !pgie || !nvtracker || !sink) {
            LOG(ERROR) << "Failed to create elements during building pipeline.";
            break;
        }


        g_object_set(G_OBJECT(streammux),
                     "width", image_width,
                     "height", image_height,
                     "batch-size", MAX_NUM_SRCS + 1,
                     "live-source", true,
                     "enable-padding", true,
                     "batched-push-timeout", 40000,
                     "nvbuf-memory-type", 0,
                     "gpu-id", gpu_id, NULL);

        g_object_set (G_OBJECT (pgie),
                      "config-file-path", config_file_path.c_str(),
                      "process-mode", 1,
                      "batch-size", MAX_NUM_SRCS + 1,
                      "gpu-id", gpu_id,
                      "raw-output-generated-callback", G_CALLBACK(gst_nvinfer_output_generated_callback),
                      NULL);

        g_object_set (G_OBJECT (nvtracker),
                      "ll-lib-file", ll_lib_file.c_str(),
                      "ll-config-file", ll_config_file.c_str(),
                      "gpu-id", gpu_id,
                      "tracker-width", tracker_width,
                      "tracker-height", tracker_height,
                      "enable-batch-process", TRUE,
                      NULL);


        //g_object_set (G_OBJECT(queue0), "leaky", 2, NULL);
        //g_object_set (G_OBJECT(queue1), "leaky", 2, NULL);
        //g_object_set (G_OBJECT(queue2), "leaky", 2, NULL);

        g_object_set (G_OBJECT (sink),
                      "sync", false,
                      NULL);

nvinfer config file:

################################################################################
# Copyright (c) 2018-2019, NVIDIA CORPORATION. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
################################################################################

# Following properties are mandatory when engine files are not specified:
#   int8-calib-file(Only in INT8), model-file-format
#   Caffemodel mandatory properties: model-file, proto-file, output-blob-names
#   UFF: uff-file, input-dims, uff-input-blob-name, output-blob-names
#   ONNX: onnx-file
#
# Mandatory properties for detectors:
#   num-detected-classes,
#   custom-lib-path,
#   parse-bbox-func-name
#
# Optional properties for detectors:
#   enable-dbscan(Default=false), interval(Primary mode only, Default=0)
#
# Mandatory properties for classifiers:
#   classifier-threshold, is-classifier
#
# Optional properties for classifiers:
#   classifier-async-mode(Secondary mode only, Default=false)
#
# Optional properties in secondary mode:
#   operate-on-gie-id(Default=0), operate-on-class-ids(Defaults to all classes),
#   input-object-min-width, input-object-min-height, input-object-max-width,
#   input-object-max-height
#
# Following properties are always recommended:
#   batch-size(Default=1)
#
# Other optional properties:
#   net-scale-factor(Default=1), network-mode(Default=0 i.e FP32),
#   model-color-format(Default=0 i.e. RGB) model-engine-file, labelfile-path,
#   mean-file, gie-unique-id(Default=0), offsets, gie-mode (Default=1 i.e. primary),
#   custom-lib-path, network-mode(Default=0 i.e FP32)
#
# The values in the config file are overridden by values set through GObject
# properties.

[property]
#gpu-id=0
net-scale-factor=1.0
offsets=0;0;0
#0=RGB, 1=BGR
model-color-format=0
model-engine-file=model/mnet-deconv-0517.caffemodel_b26_fp32.engine
model-file=model/mnet-deconv-0517.caffemodel
proto-file=model/mnet-deconv-0517.prototxt
#int8-calib-file=model/mnet-deconv-0517.table.int8
#batch-size=2
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=0
## 0=NvDsInferNetworkType_Detector, 1=NvDsInferNetworkType_Classifier, 2=NvDsInferNetworkType_Segmentation, 100=NvDsInferNetworkType_Other
network-type=100
num-detected-classes=1
interval=0
gie-unique-id=1
output-blob-names=face_rpn_cls_prob_reshape_stride32;face_rpn_bbox_pred_stride32;face_rpn_landmark_pred_stride32;face_rpn_cls_prob_reshape_stride16;face_rpn_bbox_pred_stride16;face_rpn_landmark_pred_stride16;face_rpn_cls_prob_reshape_stride8;face_rpn_bbox_pred_stride8;face_rpn_landmark_pred_stride8

[class-attrs-all]
#threshold=0.2
#eps=0.1
#group-threshold=2
#roi-top-offset=0
#roi-bottom-offset=0
#detected-min-w=0
#detected-min-h=0
#detected-max-w=0
#detected-max-h=0

## Per class configuration
# Prevent background detection
#[class-attrs-0]
#threshold=1.1

tracking config file :

%YAML:1.0
  
NvDCF:
  useBufferedOutput: 0

  maxTargetsPerStream: 30 # Max number of targets to track per stream. Recommended to set >10. Note: this value should account for the targets being tracked in shadow mode as well. Max value depends on the GPU memory capacity

  filterLr: 0.11 # learning rate for DCF filter in exponential moving average. Valid Range: [0.0, 1.0]
  gaussianSigma: 0.75 # Standard deviation for Gaussian for desired response when creating DCF filter

  minDetectorConfidence: 0.0 # If the confidence of a detector bbox is lower than this, then it won't be considered for tracking
  minTrackerConfidence: 0.2 # If the confidence of an object tracker is lower than this on the fly, then it will be tracked in shadow mode. Valid Range: [0.0, 1.0]

  featureImgSizeLevel: 1 # Size of a feature image. Valid range: {1, 2, 3, 4, 5}, from the smallest to the largest
  SearchRegionPaddingScale: 3 # Search region size. Determines how large the search region should be scaled from the target bbox.  Valid range: {1, 2, 3}, from the smallest to the largest

  maxShadowTrackingAge: 9        # Max length of shadow tracking (the shadow tracking age is incremented when (1) there's detector input yet no match or (2) tracker confidence is lower than minTrackerConfidence). Once reached, the tracker will be terminated.
  probationAge: 12                # Once the tracker age (incremented at every frame) reaches this, the tracker is considered to be valid
  earlyTerminationAge: 2         # Early termination age (in terms of shadow tracking age) during the probation period

  minVisibiilty4Tracking: 0.1    # If the visibility of the bbox of a tracker gets lower, then it will be terminated

ClancyLian · October 24, 2019, 7:22am

Hi,

when I just connect 10 sources per GPU. it also happen.

I doubt that the pipeline have bug ?

uridecodebin → nvstreammux → nvinfer → nvtracker → fakesink

I have seen the deepstream demo, the pipeline is :

rtspsrc → depay → queue → decodebin → queue → nvstreammux → …

does the plugin “queue” make sense ?

if docker would affect performance?

Thanks.

ClancyLian · October 28, 2019, 8:01am

Hi,

I write a demo in the attachment.

when i set the attribute:

g_object_set(object, "drop-frame-interval", 2, NULL);

the cpu would go up to 60% after about 4 hours. and 80% after 5 hours. and it still increase .

use command: top -H -p

6134 yzy       20   0 19.177g 510132 301836 S 66.7  1.6  89:43.52 nvv4l2decoder0:                                                                              
 6123 yzy       20   0 19.177g 510132 301836 S 28.0  1.6  40:32.52 rtpjitterbuffer                                                                              
 6133 yzy       20   0 19.177g 510132 301836 S  1.3  1.6   5:41.45 stream-muxer:sr

when i don’t set the drop-frame-interval attribute, it seems noraml.

Thanks.
test_base.cpp (10.5 KB)

ClancyLian · November 6, 2019, 10:02am

Hi,

why no one to reply my question?

Thanks.

DaneLLL · November 7, 2019, 2:12am

Hi,
It is similar to https://devtalk.nvidia.com/default/topic/1065372
And under investigation.

alpaserss · November 22, 2019, 2:48am

Hi,

I have had the same problem for several days. I set “drop-frame-interval” too, and the cpu increase over time.

Has this issue been solved?

DaneLLL · November 22, 2019, 5:21am

Hi alpaserss,
The debugging is undergoing. Will update once there is further findings. Thanks.

DaneLLL · December 6, 2019, 6:48am

Hi,
Please check
https://devtalk.nvidia.com/default/topic/1065372/deepstream-sdk/jetson-nano-shows-100-cpu-usage-after-30-minutes-with-deepstream-app-demo/post/5409211/#5409211

Topic		Replies	Views
Deepstream 6.0 nvv4l2decoder suddenly uses 100% CPU and crashes the application DeepStream SDK	16	759	July 25, 2023
NvDCF Jitter DeepStream SDK	22	2372	October 12, 2021
Test the tracker alone DeepStream SDK gstreamer	11	783	November 25, 2022
Deepstream Nvtracker, bounding boxes issues DeepStream SDK	21	5159	October 12, 2021
DeepSORT tracker not working on multiple streams (erases detections in some frames) DeepStream SDK	6	439	December 20, 2022
DeepStream SDK FAQ DeepStream SDK	44	61037	October 18, 2024
Invalid device ordinal in nvtracker DeepStream SDK	4	935	April 4, 2023
NvBufSurfTransform failed with error -3 while converting buffer DeepStream SDK	30	1255	June 25, 2023
Delay in NvDsAnalytics Line Crossing Events DeepStream SDK	17	1138	November 8, 2021
Sudden increase in CPU consumption DeepStream SDK deepstream	5	17	December 9, 2024

Why nvv4l2decoder use too much cpu?

Related topics