How to reduce cpu load for gstreamer encoding

zyukyunman · August 18, 2022, 1:16pm

Hello，
I’m using an IMX390 camera and found that h264 encoding is particularly high on the cpu load. When doing 8 camera encoding, the cpu load is as high as 150%.

How can I reduce cpu load? Optimize gstreamer code or use other encoding api?

appsrc name=appsrc ! video/x-raw,format=YUY2,width=1920,height=1080,framerate=30/1 ! nvvidconv ! video/x-raw(memory:NVMM), format=NV12, width=1920, height=1080,framerate=(fraction)30/1 ! nvv4l2h264enc control-rate=constant_bitrate bitrate=24000000 iframeinterval=0 profile=0 maxperf-enable=true all-iframe=true num-Ref-Frames=0 insert-sps-pps=false ! video/x-h264, stream-format=(string)byte-stream ! h264parse ! qtmux ! filesink location=/tmp/today/sensors_record/camera//center_camera_fov30.h264

Thanks

DaneLLL · August 19, 2022, 5:45am

Hi,
There is a memory copy in

appsrc ! video/x-raw ! nvvidconv ! video/x-raw(memory:NVMM) ! …

You can use nvv4l2camerasrc to capture the frame data into NVMM buffer directly. By default the plugin supports UYVY. Please try this patch and rebuild the plugin to support YUY2:
Macrosilicon USB - #5 by DaneLLL

zyukyunman · August 19, 2022, 7:36am

Hello，
Thanks for your reply. Because I need to record the exposure time from api.So nvv4l2camerasrc does not apply to me. Is there any other way? And what is the expected cpu load？

And I have tested that 8 imx490 record need 250% cpu load.It is too high to use.

DaneLLL · August 19, 2022, 8:20am

Hi,
The optimization is to eliminate the memory copy. Could you check if you can move your custom code into nvv4l2camerasrc plugin? It is open source and you can customize it to include your code.

zyukyunman · August 23, 2022, 2:32am

Hi,
I run 8 pipeline to encode with nvv4l2camerasrc,and the total cpu load is 150%。

gst-launch-1.0 nvv4l2camerasrc device=/dev/video2 ! “video/x-raw(memory:NVMM), width=2880, height=1860, format=YUY2, framerate=30/1” ! nvvidconv ! “video/x-raw(memory:NVMM), width=2880, height=1860, format=NV12, framerate=30/1” ! nvv4l2h264enc control-rate=constant_bitrate bitrate=62000000 iframeinterval=0 profile=0 maxperf-enable=true all-iframe=true num-Ref-Frames=0 insert-sps-pps=true ! “video/x-h264, stream-format=(string)byte-stream” ! h264parse ! qtmux ! filesink
location=/tmp/today/1.h264

It still too high for me.Maybe i can use lower bitrate,but if there is a better way?l4t-multimedia？and what is the expected effect with l4t-multimedia?

DaneLLL · August 23, 2022, 12:14pm

Hi,
The pipeline looks optimal. There’s no redundant memory copy. May not be able to do further improvement.

Do you check CPU usage through sudo tegrastats? Please execute sudo jetson_clocks to fix CPU cores at maximum clock, and then check tegrastats.

zyukyunman · August 25, 2022, 8:34am

Hi，
I have make sure that device have run by max performance.

RAM 4980/31918MB (lfb 86x4MB) SWAP 0/15959MB (cached 0MB) CPU [5%@2265,4%@2265,0%@2265,5%@2265,6%@2265,10%@2265,1%@2265,1%@2265] EMC_FREQ 2%@2133 GR3D_FREQ 0%@1377 NVENC 1075 NVENC1 1075 APE 150 MTS fg 0% bg 6% AO@59C GPU@59.5C PMIC@100C Tboard@60C AUX@61C CPU@62C Tdiode@61C GPU 1265/1265 CPU 0/0 SOC 0/0 CV 0/0 VDDRQ 474/474 SYS5V 5447/5447

And I would to use v4l2cuda sample on jetson_multimedia_api with userptr mode and zerocopy.This sample seems to be more suitable for my project,but I need to add encode h264 code into this sample. Which sample should I refer to？

zyukyunman · August 25, 2022, 9:05am

Hi,
I use perf tool to check the encode pipeline,and found that the v4l2_encthread take 50% of cpu load, and the memcpy occupies 39%。

DaneLLL · August 25, 2022, 9:55am

Hi,
For video encoding, the data has to be in NvBuffer, so it is better to use nvv4l2camerasrc or refer to 12_camera_v4l2_cuda. Your pipeline is optimal. The CPU usage should be much lower when comparing to software encoder like x264enc

zyukyunman · August 25, 2022, 12:47pm

Hi,
According to Nvv4l2h264enc latency and preset-level - #17 by DaneLLL, I run 12_camera_v4l2_cuda + NvVideoEncoder to encode IMX390 and IMX490 camera image.
The imx390 camera plays the encoded video well, but the imx490 camera does not.
This is IMX490 encoded video :

zyukyunman · August 26, 2022, 10:10am

Hi,
I have read the nvv4l2camerasrc plugin code,it is hard to move my code into nvv4l2camerasrc.Is there any way to push v4l2 buffer or memory to encoding with pipeline like：

appsrc ! “video/x-raw(memory:NVMM), width=2880, height=1860, format=YUY2, framerate=30/1” ! nvvidconv ! “video/x-raw(memory:NVMM), width=2880, height=1860, format=NV12, framerate=30/1” ! nvv4l2h264enc control-rate=constant_bitrate bitrate=62000000 iframeinterval=0 profile=0 maxperf-enable=true all-iframe=true num-Ref-Frames=0 insert-sps-pps=true ! “video/x-h264, stream-format=(string)byte-stream” ! h264parse ! qtmux ! filesink
location=/tmp/today/1.h264

DaneLLL · August 26, 2022, 10:23am

Hi,
The encoder input has to be NV12 or I420, so you have to convert YUY2 to NV12 or I420.

zyukyunman · August 26, 2022, 10:26am

Hi,
this is my pipeline now:

appsrc name=appsrc ! video/x-raw,format=YUY2,width=1920,height=1080,framerate=30/1 ! nvvidconv ! video/x-raw(memory:NVMM), format=NV12, width=1920, height=1080,framerate=(fraction)30/1 ! nvv4l2h264enc control-rate=constant_bitrate bitrate=24000000 iframeinterval=0 profile=0 maxperf-enable=true all-iframe=true num-Ref-Frames=0 insert-sps-pps=false ! video/x-h264, stream-format=(string)byte-stream ! h264parse ! qtmux ! filesink location=/tmp/today/sensors_record/camera//center_camera_fov30.h264

How can i reduce memory copy？

WriteFrame(const uint8_t *image_buffer, const size_t &image_buffer_size,const uint64_t &timestamp_ns) {
    if (nullptr == image_buffer) {
        return INTERNAL_ERROR;
    }
    GstClockTime duration, timestamp;
    GstFlowReturn status;
    duration = gst_util_uint64_scale_int(1, GST_SECOND, fps_);
    timestamp = num_frames_ * duration;
    GstBuffer *buffer = gst_buffer_new();
    gst_buffer_append_memory(
        buffer, gst_memory_new_wrapped(
                    GST_MEMORY_FLAG_READONLY, (gpointer)image_buffer,
                    image_buffer_size, 0, image_buffer_size, nullptr, nullptr));
    if (nullptr == buffer) {
        return INTERNAL_ERROR;
    }

    GST_BUFFER_DURATION(buffer) = duration;
    GST_BUFFER_PTS(buffer) = timestamp;
    GST_BUFFER_DTS(buffer) = timestamp;
    // set the current number in the frame
    GST_BUFFER_OFFSET(buffer) = num_frames_;

    /**
     * gst_app_src_push_buffer takes ownership of the buffer,
     *   so we use g_signal_emit_by_name instead.
     */
    g_signal_emit_by_name(source_, "push-buffer", buffer, &status);
    if (GST_FLOW_OK != status) {
        std::cerr << "Failed to push-buffer to GStreamer pipeline!\n";
        gst_safe_release(&buffer);
        return AD_INTERNAL_ERROR;
    }
    std::call_once(init_recorder_, [&] {
        std::string timestamp_write_path = write_path_ + ".txt";
        timestamp_writer_ = fopen(timestamp_write_path.c_str(), "w");
        if (nullptr == timestamp_writer_) {
            return INTERNAL_ERROR;
        }
    });
    if (timestamp_writer_ != nullptr) {
        fprintf(timestamp_writer_, "%d, %ld\n", num_frames_, timestamp_ns);
    }
    num_frames_++;
    return SUCCESS;
}

DaneLLL · August 26, 2022, 10:58am

Hi,
Please refer to this sample:
Opencv gpu mat into GStreamer without downloading to cpu - #15 by DaneLLL

If the data can be put into NvBuffer directly, the memory copy is eliminated.

zyukyunman · August 30, 2022, 7:24am

Hi,
How can i rm the memcpy?
memcpy(map.data, par.nv_buffer , par.nv_buffer_size);

DaneLLL · August 30, 2022, 7:54am

Hi,
This is to copy information of the NvBuffer and a must-have operation. The size is small and does not have impact to performance.

system · September 21, 2022, 4:33am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.