Batch RTSP streaming

Hi,

My pipeline is processing batches of 4 frames an input video source but I noticed that RTSP video output has same FPS than when I use batch size 1. Is it possible to improve this?

My sink config is the following:
[sink0]
enable=1
type=4
codec=1
sync=0
bitrate=4000000
rtsp-port=8554
udp-port=5400

Can you give your platform and software information?
• Hardware Platform (Jetson / GPU)
• DeepStream Version
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)

Can you upload your whole config file?
How do you measure the FPS of the output?

• Hardware Platform (Jetson / GPU) Xavier NX
• DeepStream Version 5.0
• JetPack Version (valid for Jetson only) 4.4
• TensorRT Version 7.1
• NVIDIA GPU Driver Version (valid for GPU only)

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5

[tiled-display]
enable=0
rows=1
columns=1
width=1280
height=722
gpu-id=0

[source0]
enable=1
type=2
num-sources=1
uri=file:///video.mp4

[streammux]
gpu-id=0
batch-size=2
batched-push-timeout=-1
width=1280
height=720
enable-padding=1

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File 4=RTSPStreaming
type=4
#1=h264 2=h265
codec=1
sync=0
bitrate=4000000
#H264 Profile - 0=Baseline 2=Main 4=High
#H265 Profile - 0=Main 1=Main10
#profile=0
# set below properties in case of RTSPStreaming
rtsp-port=8554
udp-port=5400


[osd]
enable=1
gpu-id=0
border-width=2
text-size=12
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Arial
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0

[primary-gie]
enable=1
gpu-id=0
batch-size=2
gie-unique-id=1
interval=0
config-file=detector.txt

How do you measure the FPS of the output?

There is only one source, the batch size has less impact if the decoder, encoder and rtsp streaming is not much faster than inferrence speed.

You may try two or more sources and the fakesink output, the performance may be different.

> How do you measure the FPS of the output?

I measure FPS using a callback from nvidia deepstream samples:

static void
perf_cb(gpointer context, NvDsAppPerfStruct *str) {
    static guint header_print_cnt = 0;
    guint i;
    AppCtx *appCtx = (AppCtx *) context;
    guint numf = (num_instances == 1) ? str->num_instances : num_instances;

    g_mutex_lock(&fps_lock);
    if (num_instances > 1) {
        fps[appCtx->index] = str->fps[0];
        fps_avg[appCtx->index] = str->fps_avg[0];
    } else {
        for (i = 0; i < numf; i++) {
            fps[i] = str->fps[i];
            fps_avg[i] = str->fps_avg[i];
        }
    }

    num_fps_inst++;
    if (num_fps_inst < num_instances) {
        g_mutex_unlock(&fps_lock);
        return;
    }

    num_fps_inst = 0;

    if (header_print_cnt % 20 == 0) {
        g_print("\n**PERF: ");
        for (i = 0; i < numf; i++) {
            g_print("FPS %d (Avg)\t", i);
        }
        g_print("\n");
        header_print_cnt = 0;
    }
    header_print_cnt++;
    g_print("**PERF: ");
    for (i = 0; i < numf; i++) {
        g_print("%.2f (%.2f)\t", fps[i], fps_avg[i]);
    }
    g_print("\n");
    g_mutex_unlock(&fps_lock);
}

> There is only one source, the batch size has less impact if the decoder, encoder and rtsp streaming is not much faster than inferrence speed.
Yes, I want to batch several frames from a single source. The decoder, encoder and rstp streaming is way faster than inference, because if I deactivate inference in the pipeline the fps rate is highly increased.

Is that possible? Let’s say I want to get frame 1, 2, 3 and 4 from video, batch them, feed these into the network and stream them later in proper order.

Is your inferrence model support implicit batch size?

The only batch size you can adjust is the inferrence batch size, you can try different batch-size value in [primary-gie] configuration( or you can modify the value in your nvinfer config file directly).

Yes,my model support implicit batchsize.

I should adjust both [primary-gie] and [streammux] batch size values but that does not increase fps. I think rtsp only streams first frame of the batch…

Is there any sample app where consecutive frames are batched?

RTSP stream is impacted by the network transferring speed. Can you try with local file source and fakesink?

Yes but I obtained same FPS results

What is your model’s speed with that batch size?

I am getting ~4 FPS for batch_size = 1 (and also for larger batches) using deepstream

I mean the speed you measured with TensorRT tool without DeepStream.
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#trtexec