Why does running deepstream on Orin Nano result in CPU usage exceeding 500%?

hardware: DevelopKit/Custom board

software version: etPack5.1.5

core board:Orin Nano 4G

After running the deepstream app test program on Orin Nano, it was found that the CPU usage was 500% and the memory usage exceeded 1.7G. The following is the test process and configuration file.

config file:

root@tegra-ubuntu:/home/nvidia/code/rtsp_test# cat /vendor_app/bin/output/algo/fvs_ai_core/model/det/0/my_config_lt.txt 
[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#gie-kitti-output-dir=streamscl



[tiled-display]
enable=0
rows=1
columns=1
width=1280
height=720
gpu-id=0
nvbuf-memory-type=0



[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP
# type=4
# uri=rtsp://foo.com/stream1.mp4
# 注意:uri 中的相对路径是相对于配置文件所在目录
# 当前配置文件在 model/det/0/,视频在 video/,所以需要 ../../../video/
# type=3
# uri=file:///home/nvidia/submit/fvs_ai_core/video/BlockedCarOnBridge.mp4
# uri=file:///home/nvidia/submit/fvs_ai_core/video/video-251217/DJI_20251217165019_0002_S.MP4

type=4
uri=rtsp://192.168.0.13:554/aibox_transfer_fpv

num-sources=1
gpu-id=0
latency=50
nvbuf-memory-type=0



[sink0]
enable=0
#Type - 1=FakeSink 2=EglSink 3=File
type=2
sync=0
source-id=0
gpu-id=0
nvbuf-memory-type=0
muxer-config=faststart=1



[sink1]
enable=0
type=6
msg-conv-config=/opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-test5/configs/dstest5_msgconv_sample_config.txt
msg-conv-payload-type=1
msg-conv-msg2p-new-api=1
msg-conv-frame-interval=1
msg-broker-proto-lib=/opt/nvidia/deepstream/deepstream/lib/libnvds_kafka_proto.so
msg-broker-conn-str=192.168.0.13;9092;dstest5
topic=dstest5
debug-payload-dir=/home/nvidia/code/fvs_ai_core/app/deepstream_yolo/result/payloads1



[sink2]
enable=0
type=3
#1=mp4 2=mkv
container=1
#1=h264 2=h265 3=mpeg4
## only SW mpeg4 is supported right now.
codec=1     # 3
#encoder type 0=Hardware 1=Software
enc-type=0      # 0 为硬件编码  1 为软件编码
sync=0
bitrate=20000000
output-file=/home/nvidia/submit/fvs_ai_core/result/20251217165019_0002_S_result.MP4
source-id=0



[sink3]
enable=1
type=4          # RTSP 推流
codec=1         # H264 编码 x264enc
enc-type=1      # 软件编码
sync=0
bitrate=4000000
profile=2
rtsp-port=8554
udp-port=5400



[osd]
enable=1
gpu-id=0
border-width=1
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Arial
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0



[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=1
batch-size=1
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=10000
## Set muxer output width and height
width=1920
height=1080
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0
## If set to TRUE, system timestamp will be attached as ntp timestamp
## If set to FALSE, ntp timestamp from rtspsrc, if available, will be attached
# attach-sys-ts-as-ntp=1



# config-file property is mandatory for any gie section.
# Other properties are optional and if set will override the properties set in
# the infer config file.
[primary-gie]
enable=1
gpu-id=0
#Required to display the PGIE labels, should be added even when using config-file
#property
batch-size=1    # 2
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
interval=0
#Required by the app for SGIE, when used along with config-file property
gie-unique-id=1
nvbuf-memory-type=0

# 模型配置(固定使用 algo_id=0 的模型)
model-engine-file=model_b1_gpu0_fp16.engine
labelfile-path=labels.txt
config-file=config_infer_primary_yolo11.txt

#infer-raw-output-dir=/home/nvidia/code/fvs_ai_core/primary_detector_raw_output/



[tracker]
enable=0
tracker-width=640
tracker-height=384
ll-lib-file=//opt/nvidia/deepstream/deepstream/lib/libByteTracker.so
gpu-id=0
display-tracking-id=1



[tests]
file-loop=0

jtop:

top:

After turning off sink 3 (turning off xh264enc), the CPU usage significantly decreased

Why does using x264enc in Deepstream consume so much CPU and memory? Is there an optimization method?

I need some help.

The Orin Nano lacks a hardware encoder, so it uses the x264enc. However, the x264enc uses CPU encoding entirely, unlike the nvv4l2h264enc which uses VIC hardware encoding.

If you want to reduce CPU usage, use fakesink or nv3dsink to avoid encoding the output as video.Your approach is correct.

Hi,junshengy:

Thanks for your reply.

The same deepstream program, but with the encoding changed to nvv4l2h264enc, only the algorithm model consumes 100% CPU when running on Orin NX 16G. Now we need to run the same function on Orin Nano, and I want to find a way to reduce CPU usage.

This is a hardware limitation. The Orin NX has hardware coding capabilities, but the Orin Nano does not, so it has to fall back to CPU coding. You can check this table to compare different Jetson development board models.

You could consider lowering the encoding resolution to reduce CPU usage.

Hi,junshengy:

Running deepstream on Orin Nano 4G, implementing visual algorithm analysis, and forwarding bitstreams is our product requirement. I need to run it on Orin Nano. I wrote a simple testing program using x264enc, and the CPU usage during testing was only about 170%, not 400%. If the CPU usage is less than two CPU cores, I also accept it.

test program:

#include <glib.h>
#include <gst/gst.h>
#include <gst/rtsp-server/rtsp-server.h>
#include <iostream>
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <string>

// 全局变量
static GMainLoop *main_loop = NULL;
static gboolean pipeline_running = FALSE;
static guint64 frame_count = 0;
static gboolean resolution_printed = FALSE; // 分辨率只打印一次

// 配置参数
#define BIND_IP "192.168.0.13"
#define RTSP_PORT 8555
#define UDP_PORT 5554

std::string m_rtsp_url = "rtsp://192.168.0.52:554";

// 信号处理
static void sigint_handler(int sig) {
  g_print("\n[INFO] Program exiting...\n");
  if (main_loop) {
    g_main_loop_quit(main_loop);
  }
}

static gint64 get_current_time_ms(void) {
  struct timespec ts;
  clock_gettime(CLOCK_MONOTONIC, &ts);
  return (gint64)ts.tv_sec * 1000 + ts.tv_nsec / 1000000;
}

static GstPadProbeReturn frame_info_probe(GstPad *pad, GstPadProbeInfo *info,
                                          gpointer user_data) {
  if (GST_IS_BUFFER(info->data)) {
    GstBuffer *buf = GST_BUFFER(info->data);
    frame_count++;
    GstClockTime pts = buf->pts;
    gdouble pts_ms = (gdouble)pts / 1000000.0;


    // 分辨率只打印一次
    if (!resolution_printed) {
      GstCaps *caps = gst_pad_get_current_caps(pad);
      if (caps) {
        GstStructure *s = gst_caps_get_structure(caps, 0);
        gint width, height;
        if (gst_structure_get_int(s, "width", &width) &&
            gst_structure_get_int(s, "height", &height)) {
          g_print("[INFO] Video Resolution: %dx%d\n", width, height);
          resolution_printed = TRUE;
        }
        gst_caps_unref(caps);
      }
    }

    // 实时打印帧号 + PTS
    //g_print("[FRAME] Count: %-4lu | PTS: %-10.2f ms\n", frame_count, pts_ms);
  }
  return GST_PAD_PROBE_OK;
}

static void on_pad_added_video(GstElement *src, GstPad *pad,
                               gpointer user_data) {
  GstElement *sink = (GstElement *)user_data;
  GstPad *sinkpad = gst_element_get_static_pad(sink, "sink");

  GstCaps *caps = gst_pad_get_current_caps(pad);
  if (caps) {
    GstStructure *s = gst_caps_get_structure(caps, 0);
    const gchar *name = gst_structure_get_name(s);
    // 只绑定视频流
    if (g_str_has_prefix(name, "video/")) {
      gst_pad_link(pad, sinkpad);
      g_print("[INFO] Video stream linked successfully\n");
    }
    gst_caps_unref(caps);
  }
  gst_object_unref(sinkpad);
}

// 总线错误处理
static gboolean bus_handler(GstBus *bus, GstMessage *msg, gpointer data) {
  if (GST_MESSAGE_TYPE(msg) == GST_MESSAGE_ERROR) {
    GError *err = NULL;
    gst_message_parse_error(msg, &err, NULL);
    g_print("[ERROR] Pipeline error: %s\n", err->message);
    g_main_loop_quit(main_loop);
    g_error_free(err);
  }
  return TRUE;
}


// ===================== 主管道创建(自动区分 FPV / 标准RTSP)
// =====================
bool create_pipeline() {
  GstElement *pipeline = gst_pipeline_new("stream-pipeline");
  GstBus *bus = gst_pipeline_get_bus(GST_PIPELINE(pipeline));
  gst_bus_add_watch(bus, bus_handler, NULL);

  // 公共元件
  GstElement *rtspsrc = gst_element_factory_make("rtspsrc", "rsrc");
  GstElement *nvvidconv = gst_element_factory_make("nvvidconv", "nvvidconv");
  GstElement *capsfilter_cpu =
      gst_element_factory_make("capsfilter", "capsfilter_cpu");
  GstElement *videorate = gst_element_factory_make("videorate", "vrate");
  //GstElement *caps_rate = gst_element_factory_make("capsfilter", "caps_rate");
  GstElement *queue_elem = gst_element_factory_make("queue", "q_buffer");
  GstElement *x264enc = gst_element_factory_make("x264enc", "enc");
  GstElement *h264parse_after =
      gst_element_factory_make("h264parse", "h264parse_after_enc");
  GstElement *rtppay = gst_element_factory_make("rtph264pay", "pay");
  GstElement *udpsink = gst_element_factory_make("udpsink", "udp_sender");

  GstElement *rtph264depay = NULL;
  GstElement *h264parse = NULL;
  GstElement *nvv4l2decoder = NULL;


    // 标准RTSP模式:创建标准解封装 + 硬解码元件
    rtph264depay = gst_element_factory_make("rtph264depay", "depay");
    h264parse = gst_element_factory_make("h264parse", "h264parse0");
    nvv4l2decoder = gst_element_factory_make("nvv4l2decoder", "nvdec");

    // 标准流配置:每帧带SPS/PPS
    g_object_set(G_OBJECT(h264parse), "config-interval", -1, NULL);

  // RTSP 服务配置
  GstRTSPServer *rtsp_server = gst_rtsp_server_new();
  GstRTSPMountPoints *mounts = gst_rtsp_server_get_mount_points(rtsp_server);
  GstRTSPMediaFactory *factory = gst_rtsp_media_factory_new();

  // ===================== 公共参数配置 =====================
  g_object_set(G_OBJECT(rtspsrc), "location", m_rtsp_url.c_str(), "latency", 50,
               "drop-on-latency", TRUE, "buffer-mode", 1, "udp-reconnect", TRUE,
               "protocols", 4, NULL);

  GstCaps *caps = gst_caps_from_string("video/x-raw,format=I420");
  g_object_set(G_OBJECT(capsfilter_cpu), "caps", caps, NULL);
  gst_caps_unref(caps);

  /*GstCaps *f_caps = gst_caps_from_string("video/x-raw,framerate=15/1");
  g_object_set(G_OBJECT(caps_rate), "caps", f_caps, NULL);
  gst_caps_unref(f_caps);*/

  g_object_set(G_OBJECT(queue_elem), "max-size-buffers", 1, "max-size-bytes", 0,
               "max-size-time", 500000000, "leaky", 2, "flush-on-eos", TRUE,
               NULL);

  g_object_set(G_OBJECT(x264enc), "speed-preset", 4, "tune", 4, "bitrate", 4000,
               "key-int-max", 15, "bframes", 0, NULL);

  g_object_set(G_OBJECT(udpsink), "host", BIND_IP, "port", UDP_PORT, "sync",
               FALSE, "async", FALSE, NULL);

  // RTSP 转发工厂
  gchar *launch_str =
      g_strdup_printf("udpsrc port=%d ! application/x-rtp,encoding-name=H264 ! "
                      "rtph264depay ! h264parse ! rtph264pay name=pay0 pt=96",
                      UDP_PORT);
  gst_rtsp_media_factory_set_launch(factory, launch_str);
  g_free(launch_str);
  gst_rtsp_mount_points_add_factory(mounts, "/reply-test", factory);
  g_object_unref(mounts);
  gst_rtsp_server_attach(rtsp_server, NULL);


    // 标准RTSP链路:rtspsrc → depay → parse → nvv4l2decoder → nvvidconv
    gst_bin_add_many(GST_BIN(pipeline), rtspsrc, rtph264depay, h264parse,
                     nvv4l2decoder, nvvidconv, capsfilter_cpu, videorate,
                     queue_elem, x264enc, h264parse_after, rtppay,
                     udpsink, NULL);

    // 静态链接标准解码链路
    gst_element_link_many(rtph264depay, h264parse, nvv4l2decoder, nvvidconv,
                          NULL);
    g_signal_connect(
        rtspsrc, "pad-added",
        G_CALLBACK(+[](GstElement *src, GstPad *new_pad, gpointer user_data) {
          GstElement *depay = GST_ELEMENT(user_data);
          GstPad *sinkpad = gst_element_get_static_pad(depay, "sink");
          if (!gst_pad_is_linked(sinkpad))
            gst_pad_link(new_pad, sinkpad);
          gst_object_unref(sinkpad);
        }),
        rtph264depay);

  // 公共后半段链路(两种流完全一致)
  gst_element_link_many(nvvidconv, capsfilter_cpu, videorate,
                        queue_elem, x264enc, h264parse_after, rtppay, udpsink,
                        NULL);

  // ===================== 注册探针:打印宽高 + PTS =====================
  GstPad *nvvidconv_src_pad = gst_element_get_static_pad(nvvidconv, "src");
  gst_pad_add_probe(nvvidconv_src_pad, GST_PAD_PROBE_TYPE_BUFFER,
                    frame_info_probe, NULL, NULL);
  gst_object_unref(nvvidconv_src_pad);

  // 启动管道
  gst_element_set_state(pipeline, GST_STATE_PLAYING);

  g_print("\n=============================================\n");
  g_print("Orin Nano RTSP Forwarder (Auto Mode)\n");
  g_print("Input:  %s\n", m_rtsp_url.c_str());
  g_print("Output: rtsp://%s:%d/reply-test\n", BIND_IP, RTSP_PORT);
  g_print("=============================================\n");

  g_main_loop_run(main_loop);

  // 资源清理
  gst_element_set_state(pipeline, GST_STATE_NULL);
  gst_object_unref(pipeline);
  gst_object_unref(bus);
  return true;
}

int main(int argc, char *argv[]) {
    if (argc > 1) {
      m_rtsp_url = argv[1];
    }
  gst_init(&argc, &argv);
  signal(SIGINT, sigint_handler);
  main_loop = g_main_loop_new(NULL, FALSE);
    g_print("rtsp url:%s\n", m_rtsp_url.c_str());
  create_pipeline();

  g_main_loop_unref(main_loop);
  return 0;
}

If you wish for the CPU utilization of deepstream-app encoding process to be comparable to that of your sample code, please add the speed-preset and threads parameter settings here; naturally, this inevitably entails a trade-off in terms of image quality.

/opt/nvidia/deepstream/deepstream/sources/apps/apps-common/src/deepstream_sink_bin.c

if (config->enc_type == NV_DS_ENCODER_TYPE_SW) {
    //bitrate is in kbits/sec for software encoder x264enc and x265enc
    g_object_set (G_OBJECT (bin->encoder), "bitrate", config->bitrate / 1000,
        NULL);
  } else {
    g_object_set (G_OBJECT (bin->encoder), "bitrate", config->bitrate, NULL);
    g_object_set (G_OBJECT (bin->encoder), "profile", config->profile, NULL);
    g_object_set (G_OBJECT (bin->encoder), "iframeinterval",
        config->iframeinterval, NULL);
  }

Hi, junshengy:

This is the default implementation in Deepstream and does not require me to add it.

For software encoding, deepstream-app currently does not support setting the speed-preset and threads parameters; you will need to add these manually.

Hi,junshengy:

After adding these parameters, the CPU usage dropped to around 180%. Now I have discovered another issue. On Orin Nano, I implemented an RTSP stream forwarding service stream_delay_magent using gstreamer, which also uses x264enc for encoding. The runtime CPU usage was about 248%. After running the deepstream program, the CPU usage of stream_delay_magent reached 321%. Then, when I killed the deepstream program, the CPU usage of stream_delay_magent increased to around 248%. Why is this?

root@tegra-ubuntu:/vendor_app/bin/output/bin# top
top - 11:43:54 up 19 min,  2 users,  load average: 11.95, 6.69, 3.03
Tasks: 292 total,   1 running, 291 sleeping,   0 stopped,   0 zombie
%Cpu0  : 44.2 us, 30.9 sy,  0.0 ni,  2.0 id,  0.0 wa, 11.6 hi, 11.3 si,  0.0 st
%Cpu1  : 74.2 us, 18.8 sy,  0.0 ni,  4.7 id,  0.3 wa,  1.0 hi,  1.0 si,  0.0 st
%Cpu2  : 79.4 us, 15.1 sy,  0.0 ni,  4.1 id,  0.0 wa,  1.0 hi,  0.3 si,  0.0 st
%Cpu3  : 72.2 us, 20.5 sy,  0.0 ni,  5.6 id,  0.0 wa,  1.0 hi,  0.7 si,  0.0 st
%Cpu4  : 84.5 us,  9.9 sy,  0.0 ni,  3.9 id,  0.0 wa,  1.4 hi,  0.4 si,  0.0 st
%Cpu5  : 77.2 us, 16.3 sy,  0.0 ni,  4.4 id,  0.0 wa,  1.4 hi,  0.7 si,  0.0 st
MiB Mem :   3426.4 total,     85.4 free,   3224.4 used,    116.6 buff/cache
MiB Swap:   1713.2 total,      0.3 free,   1712.9 used.     59.3 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                   
   1929 root      20   0   12.4g 495620  48220 S 321.8  14.1  14:56.60 stream_relay_ag                                                                                                           
  17226 root      20   0 8811584 877412 553756 S 174.6  25.0   1:43.16 visual_algo_age
  1. I don’t know any details about stream_delay_magent, this program is also not provided by nvidia. so i cannot provide the correct analysis.
  2. Some of my suggestions:
    set your Orin Nano to MAXN mode;
    Does your deepstream-app pull streams from stream_delay_magent? Is there a separate encoding instance that starts? Do not add more encoding instances.
    Binding to the core reduces the impact of CPU/Cache,
taskset -c 0-2 ./stream_delay_magent
taskset -c 3-5 deepstream-app -c config.txt

Hi,junshengy:

Add x264enc parameters are as follows:

if (config->enc_type == NV_DS_ENCODER_TYPE_SW) {
    //bitrate is in kbits/sec for software encoder x264enc and x265enc
    g_print("yhy 20260520 modify x264enc\n");
    g_object_set(G_OBJECT(bin->encoder),
                 "bitrate", config->bitrate / 1000,
                 "speed-preset", 4,
                 "tune", 4,
                 "key-int-max", 30,
                 "bframes", 0,
                 NULL);
  }

The CPU usage has decreased, but the forwarding RTSP stream has a delay of more than 5 seconds, resulting in severe lag and flickering of the screen.ow to optimize these issues in DeepStream?

Are you referring to the latency of rtsp in --> deepstream-app --> rtsp out?

This latency isn’t entirely due to forwarding.

The process involves decoding, inference, and encoding. To reduce latency, you could
1.Optimize inference time(optimize your model),
2.Increase the inference interval(adjust interval property of nvinfer)
3.Adjust kernel UDP send buffer

sudo sysctl -w net.core.wmem_max=xxx
sudo sysctl -w net.core.rmem_max=xxx

4.Adjust player parameters

ffplay -fflags nobuffer -flags low_delay -analyzeduration 0 -probesize 32768 \
       rtsp://your_server/your_stream

Hi,junshengy:

Yes, use DeepStream to implement RTSP in ->Visual Algorithm Reasoning ->OSD Drawing ->RTSP Out.

The RTSP stream output from DeepStream has extremely poor image quality, and the video below is full of mosaics.

Using the same model on Orin NX, the hardware encoding effect is good.

Here is my code :

static gboolean
create_udpsink_bin (NvDsSinkEncoderConfig * config, NvDsSinkBinSubBin * bin)
{
  GstCaps *caps = NULL;
  gboolean ret = FALSE;
  gchar elem_name[50];
  gchar encode_name[50];
  gchar rtppay_name[50];
  int probe_id = 0;
  g_start_time_ms = get_current_time_ms();
  //guint rtsp_port_num = g_rtsp_port_num++;
  uid++;

  g_snprintf (elem_name, sizeof (elem_name), "sink_sub_bin%d", uid);
  bin->bin = gst_bin_new (elem_name);
  if (!bin->bin) {
    NVGSTDS_ERR_MSG_V ("Failed to create '%s'", elem_name);
    goto done;
  }

  g_snprintf (elem_name, sizeof (elem_name), "sink_sub_bin_queue%d", uid);
  bin->queue = gst_element_factory_make (NVDS_ELEM_QUEUE, elem_name);
  if (!bin->queue) {
    NVGSTDS_ERR_MSG_V ("Failed to create '%s'", elem_name);
    goto done;
  }

  g_object_set (G_OBJECT (bin->queue),
              "max-size-buffers", 1,        // 最多缓冲2帧(和 gst-launch 一致)
              "leaky", 2,  // 帧满时丢弃下游帧(避免阻塞)
              "max-size-time", 0,          // 不限制时间缓冲
              "max-size-bytes", 0,         // 不限制字节缓冲
              NULL);

  g_snprintf (elem_name, sizeof (elem_name), "sink_sub_bin_transform%d", uid);
  bin->transform = gst_element_factory_make (NVDS_ELEM_VIDEO_CONV, elem_name);
  if (!bin->transform) {
    NVGSTDS_ERR_MSG_V ("Failed to create '%s'", elem_name);
    goto done;
  }

  if (config->enc_type == NV_DS_ENCODER_TYPE_SW) {
    g_object_set(G_OBJECT(bin->transform), "nvbuf-memory-type", 1, NULL); // 1=系统内存,适配x264enc
  }

  g_snprintf (elem_name, sizeof (elem_name), "sink_sub_bin_cap_filter%d", uid);
  bin->cap_filter = gst_element_factory_make (NVDS_ELEM_CAPS_FILTER, elem_name);
  if (!bin->cap_filter) {
    NVGSTDS_ERR_MSG_V ("Failed to create '%s'", elem_name);
    goto done;
  }

  if (config->enc_type == NV_DS_ENCODER_TYPE_SW)
    caps = gst_caps_from_string ("video/x-raw, format=I420");
  else
    caps = gst_caps_from_string ("video/x-raw(memory:NVMM), format=I420");

  g_object_set (G_OBJECT (bin->cap_filter), "caps", caps, NULL);

  g_snprintf (encode_name, sizeof (encode_name), "sink_sub_bin_encoder%d", uid);
  g_snprintf (rtppay_name, sizeof (rtppay_name), "sink_sub_bin_rtppay%d", uid);

  switch (config->codec) {
    case NV_DS_ENCODER_H264:
      bin->codecparse = gst_element_factory_make ("h264parse", "h264-parser");
      g_object_set (G_OBJECT (bin->codecparse),
          "config-interval", -1,
          "disable-passthrough", TRUE,
          NULL);
      bin->rtppay = gst_element_factory_make ("rtph264pay", rtppay_name);
      g_object_set (G_OBJECT (bin->rtppay),
                "mtu", 1400,                  // 不分片,避免丢包
                "pt", 96,                     // 明确 RTP payload type(避免协商错误)
                "config-interval", -1,        // 每个关键帧再嵌入一次 SPS/PPS(双重码流保障)
                "ssrc", 0x12345678,           // 固定 SSRC,避免客户端会话错乱
                NULL);
      if (config->enc_type == NV_DS_ENCODER_TYPE_SW)
        bin->encoder =
            gst_element_factory_make (NVDS_ELEM_ENC_H264_SW, encode_name);
      else
        bin->encoder =
            gst_element_factory_make (NVDS_ELEM_ENC_H264_HW, encode_name);
      break;
    case NV_DS_ENCODER_H265:
      bin->codecparse = gst_element_factory_make ("h265parse", "h265-parser");
      g_object_set (G_OBJECT (bin->codecparse), "config-interval", -1, NULL);
      bin->rtppay = gst_element_factory_make ("rtph265pay", rtppay_name);
      if (config->enc_type == NV_DS_ENCODER_TYPE_SW)
        bin->encoder =
            gst_element_factory_make (NVDS_ELEM_ENC_H265_SW, encode_name);
      else
        bin->encoder =
            gst_element_factory_make (NVDS_ELEM_ENC_H265_HW, encode_name);
      break;
    default:
      goto done;
  }

  if (!bin->encoder) {
    NVGSTDS_ERR_MSG_V ("Failed to create '%s'", encode_name);
    goto done;
  }

  NVGSTDS_ELEM_ADD_PROBE (probe_id,
      bin->encoder, "sink",
      seek_query_drop_prob, GST_PAD_PROBE_TYPE_QUERY_UPSTREAM, bin);

  probe_id = probe_id;

  if (!bin->rtppay) {
    NVGSTDS_ERR_MSG_V ("Failed to create '%s'", rtppay_name);
    goto done;
  }

  if (config->enc_type == NV_DS_ENCODER_TYPE_SW) {
    //bitrate is in kbits/sec for software encoder x264enc and x265enc
    g_print("yhy 20260520 modify x264enc\n");
    g_object_set(G_OBJECT(bin->encoder),
                 "bitrate", config->bitrate / 1000,
                 "speed-preset", 4,
                 "tune", 4,
                 "key-int-max", 15,
                 "bframes", 0,
                 "threads", 1,                // 限制线程数
                 "vbv-buf-capacity", 200,     // 小VBV缓冲(毫秒)
                 "qp-min", 15,
                 "qp-max", 35,
                 "ref", 1,                    // 只用一个参考帧
                 "rc-lookahead", 15,          // 预分析优化
                 "subme", 3,                  // 提升运动估计精度
                 NULL);
  } else {
    // yhy modify 202603224
     g_print ("yhy test 20260324, set enc param\n");
    g_object_set (G_OBJECT (bin->encoder),
                "bitrate", config->bitrate,
                "profile", config->profile,
                "iframeinterval", 30,      //关键帧间隔30帧(低延迟)
                "idrinterval", 30,              //和iframeinterval保持一致
                "insert-sps-pps", TRUE,
                "preset-level", 2,
                "insert-vui", 1,
                "EnableTwopassCBR", 0,
                "qp-min", 15,
                "qp-max", 28,
                NULL);
    printf("bitrate: %d\n", config->bitrate);
  }

  struct cudaDeviceProp prop;
  cudaGetDeviceProperties (&prop, config->gpu_id);

  //if (prop.integrated) {
    if (config->enc_type == NV_DS_ENCODER_TYPE_HW) {
      g_object_set (G_OBJECT (bin->encoder), "preset-level", 1, NULL);
      g_object_set (G_OBJECT (bin->encoder), "insert-sps-pps", 1, NULL);
    } else {
  //} else {
    g_object_set (G_OBJECT (bin->transform), "gpu-id", config->gpu_id, NULL);
  }

  g_snprintf (elem_name, sizeof (elem_name), "sink_sub_bin_udpsink%d", uid);
  bin->sink = gst_element_factory_make ("udpsink", elem_name);
  if (!bin->sink) {
    NVGSTDS_ERR_MSG_V ("Failed to create '%s'", elem_name);
    goto done;
  }

  // yhy modify 202603224
  g_object_set (G_OBJECT (bin->sink),
              "host", "224.224.255.255",
              "port", config->udp_port,
              "async", FALSE,
              "sync", 0,
              "buffer-size", 65536,
              "max-lateness", 100000000,    // 100ms最大延迟
              "qos", TRUE,                  // 启用QoS
              NULL);

  gst_bin_add_many (GST_BIN (bin->bin),
      bin->queue, bin->cap_filter, bin->transform,
      bin->encoder, bin->codecparse, bin->rtppay, bin->sink, NULL);

  NVGSTDS_LINK_ELEMENT (bin->queue, bin->transform);
  NVGSTDS_LINK_ELEMENT (bin->transform, bin->cap_filter);
  NVGSTDS_LINK_ELEMENT (bin->cap_filter, bin->encoder);
  NVGSTDS_LINK_ELEMENT (bin->encoder, bin->codecparse);
  NVGSTDS_LINK_ELEMENT (bin->codecparse, bin->rtppay);
  NVGSTDS_LINK_ELEMENT (bin->rtppay, bin->sink);

  NVGSTDS_BIN_ADD_GHOST_PAD (bin->bin, bin->queue, "sink");

  ret = TRUE;

  ret =
      start_rtsp_streaming (config->rtsp_port, config->udp_port, config->codec,
      config->udp_buffer_size);
  if (ret != TRUE) {
    g_print ("%s: start_rtsp_straming function failed\n", __func__);
  }

done:
  if (caps) {
    gst_caps_unref (caps);
  }
  if (!ret) {
    NVGSTDS_ERR_MSG_V ("%s failed", __func__);
  }
  return ret;
}

How can I turn off algorithm inference now and only run RTSP forwarding? After setting [primary gie] in the configuration file to disable, Deepstream cannot run.

And how can I solve the problem of mosaic at the bottom of the screen when using x264enc?

Try setting the pgie interval to a very large value, such as 999999.

I believe the x264enc is consuming excessive CPU, causing rtspsrc to be unable to collect UDP packets in a timely manner, resulting in packet loss and these problems.

These issues are not caused by DeepStream components. I can only offer some suggestions, but these are tradeoffs. For example, reducing the encoding resolution, frame rate, and increasing the rtspsrc latency (increasing the latency value), etc.

Hi, junshengy:

After I turned off OSD drawing, the image was very clear, and I want to know more about it in Orin Nano. How is the OSD of Deepstream implemented?

[osd]
enable=0
gpu-id=0
border-width=1
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Arial
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

nvdsosd is open-source(/opt/nvidia/deepstream/deepstream/sources/gst-plugins/gst-nvdsosd), it draws elements such as bounding boxes, clocks, and text directly onto the current frame.

Is the “no clear” you mentioned similar to this topic? Or something else?

What is the current status of the issue? Are you performing inference using deepstream-app, or are you merely forwarding the data?

Hi,junshengy:

I’m very sorry for providing the wrong information. The improvement in picture quality is due to me enabling hardware encoding(nvv4l2h264enc). After I changed it to software encoding, the stream displayed only mosaic.

Now I want to try a different architecture:

rtspsrc → tee → [decodebin → nvinfer → ...]   (for inference)
         └→ queue → h264parse → rtph264pay → rtsp server   (forwarding original H.264 without re-encoding)

That is:

  • Branch 1 (inference): decode, run primary GIE, but do NOT re-encode.

  • Branch 2 (forwarding): take the original H.264 stream directly from rtspsrc, without any decoding or re-encoding, repackage it into RTP and send to an RTSP server

My questions:

  1. Is this architecture feasible on Orin Nano with DeepStream (or plain GStreamer)?
  2. How to correctly split the stream from rtspsrc using tee so that one path goes to decode (inference) and the other goes directly to h264parse?
  3. Can h264parse handle the raw H.264 stream from rtph264depay? Any specific properties needed?
  4. Does DeepStream’s pipeline design (e.g., nvstreammux) support such a split before decoding? If not, what is the recommended way to achieve “decode only for inference, forward original stream untouched”?

Any code examples, config snippets, or advice would be greatly appreciated.

Yes, you can refer to the gst-launch-1.0 command line provided below.

But what is the point of doing so? If you are play a remote RTSP stream, you can simply play the raw stream directly; the purpose of the deepstream-app streaming pipeline is specifically to use the OSD to overlay inference results onto the video.
Only forwarding rtsp is meaningless for you goal.

gst-launch-1.0 -e   rtspsrc location=rtsp://10.19.227.215/media/video1 latency=2000 protocols=tcp ! rtph264depay ! \
                    h264parse config-interval=-1 ! tee name=t t. ! \
                    queue max-size-buffers=4 leaky=downstream ! \
                    nvv4l2decoder ! mux.sink_0 t. ! queue max-size-buffers=4 leaky=downstream ! \
                    rtph264pay config-interval=1 pt=96 mtu=1400  ! \
                    udpsink host=127.0.0.1 port=5400 sync=false async=false \
                    nvstreammux name=mux batch-size=1 width=1920 height=1080 batched-push-timeout=40000 live-source=1 ! \
                    nvinfer config-file-path=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_infer_primary.txt  batch-size=1 ! \
                    nvvideoconvert ! nvdsosd ! nv3dsink sync=false

Hi, junshengy:

My goal is to avoid re-encoding on Orin Nano (since software encoding is too slow). Instead, I want to:

  • Run inference on Orin Nano,

  • Embed the inference results as custom SEI messages into the original H.264 stream,

  • Forward the raw stream (with SEI) to the ground station,

  • Let the ground station decode and draw the overlays.

Is this architecture feasible? Can I use tee to split the stream after rtph264depay – one branch to inference, another branch to SEI insertion + RTSP forwarding?

Thanks.

Yes, doing this should be fine. I have verified this approach using by cursor with opus-4.7 model.
You can refer to this zip file; if it does not work in your orin nano, please debug it yourself.

sei-inject-release.zip (13.4 KB)

We have also made DeepStream programming skills available on GitHub.