Stopping gdm increases Latency on NVENC

Hi,

I wrote a program in which I use the Jetson multimedia API to capture frames and then encode them. I’m actually very happy with the latencies.

But i noticed that the following command leads to an increase in NVENC latency (i nee to execute the command to use drm for another part of my Project):

sudo systemctl stop gdm

Here are my average latencies before stopping the gdm (~12ms for encoding):

INFO: check(): (line:142) Enc Bitrate avg:        0.930888 Mbit/s
INFO: check(): (line:142) SOF to CR time Δ avg:   16.316132 ms
INFO: check(): (line:142) SOF to CL time Δ avg:   16.326601 ms
INFO: check(): (line:142) SOF to Comb time Δ avg: 19.154358 ms
INFO: check(): (line:142) SOF to Enc time Δ avg:  31.073876 ms
INFO: check(): (line:142) Enc Bitrate avg:        0.966408 Mbit/s
INFO: check(): (line:142) SOF to CR time Δ avg:   16.300697 ms
INFO: check(): (line:142) SOF to CL time Δ avg:   16.306919 ms
INFO: check(): (line:142) SOF to Comb time Δ avg: 19.146263 ms
INFO: check(): (line:142) SOF to Enc time Δ avg:  31.172676 ms
INFO: check(): (line:142) Enc Bitrate avg:        0.896328 Mbit/s
INFO: check(): (line:142) SOF to CR time Δ avg:   16.360031 ms
INFO: check(): (line:142) SOF to CL time Δ avg:   16.366589 ms
INFO: check(): (line:142) SOF to Comb time Δ avg: 19.196853 ms
INFO: check(): (line:142) SOF to Enc time Δ avg:  31.351798 ms

Here are my latencies after stopping the gdm (~51ms):

INFO: check(): (line:142) Enc Bitrate avg:        0.922920 Mbit/s
INFO: check(): (line:142) SOF to CL time Δ avg:   16.286754 ms
INFO: check(): (line:142) SOF to CR time Δ avg:   16.296088 ms
INFO: check(): (line:142) SOF to Comb time Δ avg: 19.145151 ms
INFO: check(): (line:142) SOF to Enc time Δ avg:  70.346858 ms
INFO: check(): (line:142) Enc Bitrate avg:        0.920184 Mbit/s
INFO: check(): (line:142) SOF to CL time Δ avg:   16.289686 ms
INFO: check(): (line:142) SOF to CR time Δ avg:   16.296840 ms
INFO: check(): (line:142) SOF to Comb time Δ avg: 19.147402 ms
INFO: check(): (line:142) SOF to Enc time Δ avg:  70.438493 ms
INFO: check(): (line:142) Enc Bitrate avg:        1.248536 Mbit/s
INFO: check(): (line:142) SOF to CL time Δ avg:   16.290126 ms
INFO: check(): (line:142) SOF to CR time Δ avg:   16.293413 ms
INFO: check(): (line:142) SOF to Comb time Δ avg: 19.130861 ms
INFO: check(): (line:142) SOF to Enc time Δ avg:  70.387857 ms

I tried to double check it with the 01_video_encode example. There it seems to fix the framerate to 60fps:

Before stopping gdm:

Creating Encoder in blocking mode 
Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 4 
===== NvVideo: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4 
875967048
842091865
H264: Profile = 66 Level = 51 
NVMEDIA: Need to set EMC bandwidth : 846000 
NvVideo: bBlitMode is set to TRUE 
Could not read complete frame from input file
File read complete.
Got 0 size buffer in capture 
----------- Element = enc0 -----------
Total Profiling time = 1.50695
Average FPS = 392.182
Total units processed = 592
Average latency(usec) = 23638
Minimum latency(usec) = 7099
Maximum latency(usec) = 25128
-------------------------------------
App run was successful

After stopping gdm:

Creating Encoder in blocking mode 
Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 4 
===== NvVideo: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4 
875967048
842091865
H264: Profile = 66 Level = 51 
NVMEDIA: Need to set EMC bandwidth : 846000 
NvVideo: bBlitMode is set to TRUE 
Could not read complete frame from input file
File read complete.
Got 0 size buffer in capture 
----------- Element = enc0 -----------
Total Profiling time = 9.83096
Average FPS = 60.1162
Total units processed = 592
Average latency(usec) = 165437
Minimum latency(usec) = 117693
Maximum latency(usec) = 245793
-------------------------------------
App run was successful

That is a real dealbreaker for me. What could be the issue here? How can i avoid it so the NVENC behaves the same after stopping the gdm?

Device: Orin NX 16G in MAXN_SUPER Mode

L4T Version: R36 (release), REVISION: 4.3, GCID: 38968081, BOARD: generic, EABI: aarch64, DATE: Wed Jan 8 01:49:37 UTC 2025

JETPACK Version: 6.2.1

Hi,
Do you execute only $ sudo systemctl stop gdm? Per
Accelerated GStreamer — NVIDIA Jetson Linux Developer Guide

You should execute three commands:

$ sudo systemctl stop gdm
$ sudo loginctl terminate-seat seat0
$ sudo modprobe nvidia-drm modeset=1

Hi,

Yes i know and it is the same behavior after the three commands but also allready after $ sudo systemctl stop gdm.

However, i found out that if I permanently disable gdm as following and reboot, the latencies are low again, so this solves the gdm problem:

$ sudo systemctl stop gdm
$ sudo systemctl disable gdm
$ sudo reboot

But the Problem reappears if execute the following command:

$ sudo modprobe nvidia-drm modeset=1

Before:

INFO: check(): (line:142) Enc Bitrate avg:        0.916520 Mbit/s
INFO: check(): (line:142) SOF to CR time Δ avg:   16.332213 ms
INFO: check(): (line:142) SOF to CL time Δ avg:   16.306807 ms
INFO: check(): (line:142) SOF to Comb time Δ avg: 19.182318 ms
INFO: check(): (line:142) SOF to Enc time Δ avg:  31.134973 ms
INFO: check(): (line:142) Enc Bitrate avg:        0.881776 Mbit/s
INFO: check(): (line:142) SOF to CR time Δ avg:   16.325324 ms
INFO: check(): (line:142) SOF to CL time Δ avg:   16.351858 ms
INFO: check(): (line:142) SOF to Comb time Δ avg: 19.185553 ms
INFO: check(): (line:142) SOF to Enc time Δ avg:  31.164096 ms

After:

INFO: check(): (line:142) Enc Bitrate avg:        0.882848 Mbit/s
INFO: check(): (line:142) SOF to CR time Δ avg:   16.310990 ms
INFO: check(): (line:142) SOF to CL time Δ avg:   16.318336 ms
INFO: check(): (line:142) SOF to Comb time Δ avg: 19.215118 ms
INFO: check(): (line:142) SOF to Enc time Δ avg:  36.413937 ms
INFO: check(): (line:142) Enc Bitrate avg:        0.876080 Mbit/s
INFO: check(): (line:142) SOF to CL time Δ avg:   16.358304 ms
INFO: check(): (line:142) SOF to CR time Δ avg:   16.347824 ms
INFO: check(): (line:142) SOF to Comb time Δ avg: 19.315311 ms
INFO: check(): (line:142) SOF to Enc time Δ avg:  36.398404 ms

The amount by which latency increases is not consistent; sometimes it is more, sometimes less. What is certain, however, is that it is always exactly the same before the command and does not fluctuate.

Since it already worked with gdm, I tried not to load the nvidia_drm driver after bootup, but during bootup by adding the following line to the bottom of the following file:

$ cat /etc/modules-load.d/nv.conf
# SPDX-FileCopyrightText: Copyright (c) 2020-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: LicenseRef-NvidiaProprietary
#
# NVIDIA CORPORATION, its affiliates and licensors retain all intellectual
# property and proprietary rights in and to this material, related
# documentation and any modifications thereto. Any use, reproduction,
# disclosure or distribution of this material and related documentation
# without an express license agreement from NVIDIA CORPORATION or
# its affiliates is strictly prohibited.

#nvmap module
nvmap

# nvgpu module
nvgpu

# pwm-fan module
pwm-fan

# ina3221 module
ina3221

# RDMA module
nvidia-p2p

nvidia-drm

In the kernel log and via lsmod, it looks as if the driver has been loaded successfully (the exact same thing appears when I do it manually after bootup):

$ sudo dmesg | grep nvidia
[   10.404597] nvidia_modeset: module verification failed: signature and/or required key missing - tainting kernel
[   10.409843] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for aarch64  540.4.0  Release Build  (buildbrain@mobile-u64-6336-d8000)  Tue Jan  7 17:35:14 PST 2025
[   10.414787] [drm] [nvidia-drm] [GPU ID 0x00020000] Loading driver
[   10.910943] [drm] Initialized nvidia-drm 0.0.0 20160202 for 13800000.display on minor 0
[   11.135898]  os_dump_stack+0x1c/0x28 [nvidia]
[   11.136010]  nvAssertFailedBacktrace.part.0+0x80/0x90 [nvidia]
[   11.136103]  kdispArbAndAllocDisplayBandwidth_v04_02+0x274/0x290 [nvidia]
[   11.136194]  kdispInvokeDisplayModesetCallback_KERNEL+0xa8/0xf0 [nvidia]
[   11.136283]  osTegraDceClientIpcCallback+0x84/0xb0 [nvidia]
[   11.211536] nv_platform 13800000.display: [drm] fb0: nvidia-drmdrmfb frame buffer device
nvidia_vrs_pseq        16384  0
nvidia_drm             94208  0
nvidia_modeset       1261568  1 nvidia_drm
nvidia               1589248  1 nvidia_modeset
tegra_dce             110592  2 nvidia
tsecriscv              36864  1 nvidia
host1x_nvhost          40960  10 nvhost_isp5,nvhost_nvcsi_t194,nvidia,tegra_camera,nvhost_nvdla,nvhost_capture,nvhost_nvcsi,nvhost_pva,nvhost_vi5,nvidia_modeset
drm_kms_helper        303104  4 tegra_drm,nvidia_drm
nvidia_p2p             20480  0
host1x                208896  8 host1x_nvhost,host1x_fence,nvgpu,tegra_drm,nvhost_nvdla,nvidia_drm,nvhost_pva,nvidia_modeset
mc_utils               16384  3 nvidia,nvgpu,tegra_camera_platform
drm                   634880  5 drm_kms_helper,nvidia,tegra_drm,nvidia_drm

Unfortunately, the following error occurs in my program afterwards:

[ERROR] (NvDrmRenderer.cpp:248) <renderer0> Couldn't obtain DRM-KMS resources 
[ERROR] (NvDrmRenderer.cpp:1083) <renderer0> No plane resource available 

When I do the following, rendering works again, but the latencies are high once more:

sudo modprobe -r nvidia_drm
sudo modprobe nvidia_drm modeset=1

DRM rendering takes place in a different program than the encoding program, just so there is no misunderstanding. The encoding latencies are also affected when no program that accesses drmRenderer is running. What can I do here?

Hi,
Please try the rootfs:
Root File System — NVIDIA Jetson Linux Developer Guide
Root File System — NVIDIA Jetson Linux Developer Guide

It does not Ubuntu desktop and you can load nvidia_drm at bootup. There are some examples of running commands at bootup:
Jetson AGX Orin FAQ

Hi,

Thanks for the Proposal. But does that really solve the Problem? I can also load the nvidia_drm driver manually, that’s not a problem. I just tried loading it at bootup to see if that would solve the problem. What could be causing my encoding latency to increase when the nvidia_drm driver is loaded?

Hi

Today, I built a minimal rootfs and loaded it onto a second Jetson. After a few difficulties, this worked. I then installed Jetpack and copied my jetson_multimedia_api folder (which also contains my own program) over. Unfortunately, the behavior is exactly the same:

Encoding pipeline without nvida_drm loaded:


INFO: check(): (line:142) Enc Bitrate avg:        0.915168 Mbit/s
INFO: check(): (line:142) SOF to CR time Δ avg:   16.284512 ms
INFO: check(): (line:142) SOF to CL time Δ avg:   16.291129 ms
INFO: check(): (line:142) SOF to Comb time Δ avg: 19.074037 ms
INFO: check(): (line:142) SOF to Enc time Δ avg:  31.038793 ms
INFO: check(): (line:142) Enc Bitrate avg:        0.883072 Mbit/s
INFO: check(): (line:142) SOF to CL time Δ avg:   16.291371 ms
INFO: check(): (line:142) SOF to CR time Δ avg:   16.284982 ms
INFO: check(): (line:142) SOF to Comb time Δ avg: 19.094147 ms
INFO: check(): (line:142) SOF to Enc time Δ avg:  31.215960 ms

Decoding pipeline without nvida_drm loaded:

INFO: check(): (line:142) SOF to RTP time Δ avg:   31.850010 ms
INFO: check(): (line:142) SOF to DEC time Δ avg:   35.983689 ms
INFO: check(): (line:142) SOF to RTP time Δ avg:   32.445725 ms
INFO: check(): (line:142) SOF to DEC time Δ avg:   36.111481 ms

Really stable and good latencys. And then exactly the same code after sudo modprobe nvidia_drm modeset=1:

Encoding pipeline:

INFO: check(): (line:142) Enc Bitrate avg:        0.892184 Mbit/s
INFO: check(): (line:142) SOF to CR time Δ avg:   16.401402 ms
INFO: check(): (line:142) SOF to CL time Δ avg:   16.402641 ms
INFO: check(): (line:142) SOF to Comb time Δ avg: 19.313533 ms
INFO: check(): (line:142) SOF to Enc time Δ avg:  38.346204 ms
INFO: check(): (line:142) Enc Bitrate avg:        0.893680 Mbit/s
INFO: check(): (line:142) SOF to CR time Δ avg:   16.403160 ms
INFO: check(): (line:142) SOF to CL time Δ avg:   16.399329 ms
INFO: check(): (line:142) SOF to Comb time Δ avg: 19.306112 ms
INFO: check(): (line:142) SOF to Enc time Δ avg:  38.424531 ms

Decoding Pipeline:

INFO: check(): (line:142) SOF to RTP time Δ avg:   187.890345 ms
INFO: check(): (line:142) SOF to DEC time Δ avg:   343.569704 ms
INFO: check(): (line:142) SOF to RTP time Δ avg:   189.611606 ms
INFO: check(): (line:142) SOF to DEC time Δ avg:   346.370584 ms

What is happening here. Please really try to help, i think there is a big problem here. How can the nvidia_drm driver influence so dramatically the encoder and decoder?

If I still need to test how it behaves when the driver is loaded at boot time, please explain to me how to do this correctly. The approach I tried did not work.

Hi,
We will set up developer kit and run 00 and 01 sample. See if we can reproduce the latency. Will update.

Hi,

I have tried to verify by myslef on our board because we have no devkit available. I tried it with the 01_video_encode sample:

Before $ sudo modprobe nvidia_drm modeset=1:

$ sudo ./video_encode ../../data/Video/sample_outdoor_car_1080p_10fps_420.yuv 1920 1080 H265 test.h265 --stats --max-perf -hpt 1

Creating Encoder in blocking mode 
Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 8 
===== NvVideo: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 8 
892744264
842091865
NvVideo: H265 : Profile : 1 
NVMEDIA: Need to set EMC bandwidth : 846000 
NvVideo: bBlitMode is set to TRUE 
Could not read complete frame from input file
File read complete.
Got 0 size buffer in capture 
----------- Element = enc0 -----------
Total Profiling time = 1.46959
Average FPS = 402.153
Total units processed = 592
Average latency(usec) = 22979
Minimum latency(usec) = 8646
Maximum latency(usec) = 25620
-------------------------------------
App run was successful

After $ sudo modprobe nvidia_drm modeset=1:

sudo ./video_encode ../../data/Video/sample_outdoor_car_1080p_10fps_420.yuv 1920 1080 H265 test.h265 --stats --max-perf -hpt 1
Creating Encoder in blocking mode 
Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 8 
===== NvVideo: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 8 
892744264
842091865
NvVideo: H265 : Profile : 1 
NVMEDIA: Need to set EMC bandwidth : 846000 
NvVideo: bBlitMode is set to TRUE 
Could not read complete frame from input file
File read complete.
Got 0 size buffer in capture 
----------- Element = enc0 -----------
Total Profiling time = 9.82935
Average FPS = 60.126
Total units processed = 592
Average latency(usec) = 165420
Minimum latency(usec) = 119712
Maximum latency(usec) = 243946
-------------------------------------
App run was successful

Seems the same problem appears in your samples on my board.

I hope to hear from you soon, we are struggling for days now with this problem.

Hi,
The issue is not seen while DRM is running. While running 08 sample, frame rate is not capped to 60fps in running both 00 and 01 sample. So if you are not running DRM, please do

$ sudo rmmod nvidia-drm

And probe nvidia-drm when you would like to run DRM.

Hi,

First of all: What kind of sense does it make that the NVENC/NVDEC are capped at 60 frames when the nvidi_drm driver is loaded? Maybe i use drm at a certain time in my programm, and then?

Please give some expplanation. Why is it capped at 60 fps when nvidia_drm is loaded but not used?

Second: Even if i try your samples they way you did, there is still a significant performance/latency difference:

Before loading nvidia_drm:

01_video_encode sample before loading nvidia_drm:

$ sudo ./video_encode ../../data/Video/sample_outdoor_car_1080p_10fps_420.yuv 1920 1080 H265 test.h265 --stats --max-perf -hpt 1
Creating Encoder in blocking mode 
Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 8 
===== NvVideo: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 8 
892744264
842091865
NvVideo: H265 : Profile : 1 
NVMEDIA: Need to set EMC bandwidth : 846000 
NvVideo: bBlitMode is set to TRUE 
Could not read complete frame from input file
File read complete.
Got 0 size buffer in capture 
----------- Element = enc0 -----------
Total Profiling time = 1.46709
Average FPS = 402.839
Total units processed = 592
Average latency(usec) = 23086
Minimum latency(usec) = 9088
Maximum latency(usec) = 25644
-------------------------------------
App run was successful

After loading nvidia_drm:

01_video_encode while executing 08_video_dec_drm:

Creating Encoder in blocking mode 
Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 8 
===== NvVideo: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 8 
892744264
842091865
NvVideo: H265 : Profile : 1 
NVMEDIA: Need to set EMC bandwidth : 846000 
NvVideo: bBlitMode is set to TRUE 
Could not read complete frame from input file
File read complete.
Got 0 size buffer in capture 
----------- Element = enc0 -----------
Total Profiling time = 2.31926
Average FPS = 254.823
Total units processed = 592
Average latency(usec) = 37415
Minimum latency(usec) = 9093
Maximum latency(usec) = 52814
-------------------------------------
App run was successful

Not capped but 14 ms more latency. Im sure you have the same results if you look closely.

Hi,
Calling drmModeRmFB() in each rendering impacts encoding performance. Please try the mode of reusing fb:

diff --git a/multimedia_api/ll_samples/samples/common/classes/NvDrmRenderer.cpp b/multimedia_api/ll_samples/samples/common/classes/NvDrmRenderer.cpp
index 489ceaf..8cba23a 100644
--- a/multimedia_api/ll_samples/samples/common/classes/NvDrmRenderer.cpp
+++ b/multimedia_api/ll_samples/samples/common/classes/NvDrmRenderer.cpp
@@ -825,7 +825,7 @@ NvDrmRenderer::renderInternal(int fd)
      * We get new FDs from camera consumer. Don't do mapping until
      * we can resolve that.
      */
-//    map_list.insert(std::make_pair(fd, fb));
+    map_list.insert(std::make_pair(fd, fb));
   }
 
   if (last_render_time.tv_sec != 0)
@@ -882,9 +882,6 @@ NvDrmRenderer::renderInternal(int fd)
     drmUtilCloseGemBo (drm_fd,bo_handles[i]);
   }
 
-   if(last_fb)
-    drmModeRmFB(drm_fd, last_fb);
-
   last_fb = fb;
 
   profiler.finishProcessing(0, frame_is_late);

Hi,

Now the fps on the encoder looks better when executing both simultaneously but the render on the screens stutters like crazy, its unusable.

That solves one problem, but creates another. Also, I still don’t understand why loading the DRM driver affects the latency of the encoder/decoder/NvBufSurfTransform. Could you please explain that?

Hi,
Please apply the patch and try 60fps:

01_video_encode$ ./video_encode ~/1080.yuv 1920 1080 H265 a.h265 --stats
08_video_dec_drm$ ./video_dec_drm ../../data/Video/sample_outdoor_car_1080p_10fps.h264 H264 --disable-ui -fps 60

See if this setup is good for your use-case.
In DRM when FB is created/released in each rendering, the rendering takes higher priority and may impact other hardware tasks. In order to have better hardware encoding throughput, would need to re-use FB to give hardware resource to other hardware tasks.

Hi,

I have done exactly as you said, but the displayed video on stream still stutters with the patch applied. Without the patch it is normal.