Lags during operation of DeepStream 6.1 TAO3.0 gaze and emotion detectors

I combined the gaze and emotion detectors into one pipeline (the code from the “release/tao3.0_ds6.1ga” Git branch was taken as the basis). Archive with the project in the attachment to the post.
The input signal is a video stream from a webcam via the RTSP protocol.
Unfortunately, I found a lag in the work of the detectors (both separately and combined into one pipeline).
Sometimes the lag decreases to acceptable values ​​​​within 1…3 seconds, but sometimes the lag has a value of about 10 seconds (I measured it with a stopwatch, waving my hand in front of the camera and when my image on the screen repeats the movement) and this lag did not disappear/did not decrease .
I connect the detectors in a pipeline in the following order:

    /* Set up the pipeline */
    /* we add all elements into the pipeline */
    gst_bin_add_many(GST_BIN(pipeline), primary_detector, second_detector, gaze_identifier, emotioninfer, queue1, queue2, queue3, queue4, queue5, queue6, queue7, nvvidconv, nvosd, nvtile, sink, NULL);
    if (!gst_element_link_many(streammux, queue1, primary_detector, queue2, second_detector, queue3, gaze_identifier, queue4, emotioninfer, queue5, nvtile, queue6, nvvidconv, queue7, nvosd, NULL))
        g_printerr("Inferring and tracking elements link failure.\n");
        return -1;

Starting the program:

./deepstream-emotional-gaze-app 3 ../../../configs/facial_tao/sample_faciallandmarks_config.txt rtsp://

Pipeline struct:

I did not find any lags from the RTSP server side, see the parameters below.
I start the RTSP server with the following parameters:

./test-launch "v4l2src device=/dev/video0 do-timestamp=true ! video/x-raw, width=640, height=480, framerate=30/1 ! nvvidconv ! nvv4l2h264enc bitrate=1280000 control-rate=true vbv-size=64000 insert-vui=true insert-sps-pps=true ! h264parse ! rtph264pay name=pay0"

I form the source video stream for detectors in 480p HQ format: resolution 640x480, bit rate 1.28Mbps.
I calculated the virtual buffer size using the formula 1.5*(bitrate/fps):

1.5*(1280000/30) = 64000bps

Based on the recommendations in this post:

The nominal bit rate and other parameters of the 480p HQ format are taken from here (table “…and for older, non-widescreen content (with only 75% as many pixels)…”):

• Hardware: Jetson Xavier NX, webcam Logitech C920 Pro HD
• Firmware: JetPack 5.0.1 Developer Preview

Is it possible to solve this problem with lags in the work of detectors?
Thanks in advance.

deepstream-emotional-gaze-app_01.08.2008_tao3.0_ds6.1ga.rar (201.9 KB)

Moving to Deepstream forum.

What is the latency of the input video? What is the latency when you use local media file as the input?