DeepStream 9.0: delayed second nvv4l2h264enc fails with NVMM appsrc buffers (output-io-mode=auto)

• Hardware Platform (Jetson / GPU)

Jetson AGX Thor (ARM64) and NVIDIA GeForce RTX 4060 Laptop GPU (both reproduce on DeepStream 8.0/9.0)

• DeepStream Version

Reproduced:

DeepStream 9.0 (nvcr.io/nvidia/deepstream:9.0-triton-multiarch)

Comparison baseline:

DeepStream 7.1 (nvcr.io/nvidia/deepstream:7.1-triton-multiarch) on x86, where the issue is not reproduced.

Additional Jetson check:

DeepStream 8.0 (nvcr.io/nvidia/deepstream:8.0-triton-multiarch) on Jetson Thor also reproduces the issue.

• JetPack Version (valid for Jetson only)

Jetson Linux R38.2.2

/etc/nv_tegra_release:


# R38 (release), REVISION: 2.2, GCID: 42205042, BOARD: generic, EABI: aarch64, DATE: Thu Sep 25 22:47:11 UTC 2025

• TensorRT Version

DeepStream container bundled TensorRT version.

• NVIDIA GPU Driver Version (valid for GPU only)

x86 + dGPU: 590.48.01

Jetson Thor: 580.00

• Issue Type (questions, new requirements, bugs)

Bug

• Relation to previous NVIDIA forum issue

This issue appears related to the previously reported DeepStream 8.0 NVMM appsrc encoder bug:

NVIDIA support confirmed that previous issue and later stated that it was fixed in DeepStream 9.0.

The current reproducer is not the same trigger. The previous issue covered a single encoder fed from external NVMM buffers. This new issue covers the same NVMM appsrc encoder layer, but with a delayed secondary encoder started while a primary encoder is already running.

So the DeepStream 9.0 fix appears to cover the original single-encoder case, but the delayed secondary encoder case still fails with output-io-mode=auto.

• Impact

This is a production-impacting regression for pipelines that rely on NVIDIA hardware encoding and NVMM buffers for maximum throughput.

The failing path is not project-specific: the minimal repro is only appsrc -> queue -> nvv4l2h264enc -> fakesink, with NVMM buffers coming from a DeepStream-owned pool.

The mmap setting is useful as a diagnostic workaround, but it is not a performance-neutral replacement for output-io-mode=auto in a high-throughput multi-session system. It changes the encoder IO path and can affect GPU/CPU memory movement, latency, and session density. We need to understand whether auto selecting the failing path is expected behavior, a DeepStream regression, or a driver/V4L2 issue.

• How to reproduce the issue?

We prepared a minimal standalone reproducer (nvmm_appsrc_dual_record_enc_repro.cpp + Makefile) with no project/business logic, no input file, no parser, no muxer, and no output video file.

The reproducer creates two independent encoder branches and validates only the GStreamer bus result.

Source flow:


videotestsrc is-live=true

-> nvvideoconvert

-> video/x-raw(memory:NVMM),format=NV12

-> appsink

-> external DeepStream NVMM pool (gst_nvds_buffer_pool_new)

Each encoder branch:


appsrc

-> queue

-> nvv4l2h264enc

-> fakesink

Important behavior:

  • Primary encoder starts immediately with output-io-mode=auto.

  • Secondary encoder starts later at source buffer 8.

  • Secondary encoder receives one already-captured NVMM buffer from the ring.

  • Default repro thresholds are intentionally small:

  • ring-size=8

  • secondary-start-buffer=8

  • secondary-preroll-buffers=1

Run command (DeepStream 9.0, failure case):


docker build --build-arg DS_VERSION=9.0 -t nvmm-appsrc-dual-enc-repro:9.0 .

docker run --rm --gpus all \

-e NVIDIA_DRIVER_CAPABILITIES=compute,video,utility \

-e GST_DEBUG="v4l2*:6,nvv4l2*:6" \

nvmm-appsrc-dual-enc-repro:9.0

Run command (DeepStream 9.0, workaround/control):


docker run --rm --gpus all \

-e NVIDIA_DRIVER_CAPABILITIES=compute,video,utility \

-e GST_DEBUG="v4l2*:6,nvv4l2*:6" \

nvmm-appsrc-dual-enc-repro:9.0 \

--secondary-output-io-mode mmap

Run command (DeepStream 7.1, x86 comparison baseline):


docker build --build-arg DS_VERSION=7.1 -t nvmm-appsrc-dual-enc-repro:7.1 .

docker run --rm --gpus all \

-e NVIDIA_DRIVER_CAPABILITIES=compute,video,utility \

-e GST_DEBUG="v4l2*:6,nvv4l2*:6" \

nvmm-appsrc-dual-enc-repro:7.1

On Jetson Thor, DeepStream libraries such as libnvbufsurface.so can be symlinks to tegra driver libraries mounted only by NVIDIA runtime. If docker build cannot link the reproducer on Jetson, build and run it inside the runtime container:


docker run --rm --runtime nvidia --gpus all \

--entrypoint bash \

-e NVIDIA_DRIVER_CAPABILITIES=compute,video,utility \

-e GST_DEBUG="v4l2*:6,nvv4l2*:6" \

-v "$PWD":/work \

-w /work \

nvcr.io/nvidia/deepstream:9.0-triton-multiarch \

-lc 'make clean all DS_SDK_ROOT=/opt/nvidia/deepstream/deepstream && ./nvmm_appsrc_dual_record_enc_repro'

Observed behavior:

  • x86 + DS 7.1 + auto: succeeds.

  • x86 + DS 9.0 + auto: fails.

  • x86 + DS 9.0 + secondary mmap: succeeds.

  • Jetson Thor + DS 8.0 + auto: fails.

  • Jetson Thor + DS 9.0 + auto: fails.

  • Jetson Thor + DS 9.0 + secondary mmap: succeeds.

Typical failure log:


Starting delayed secondary recorder: source-buffer=8, configured-start-buffer=8, preroll-burst=1, ring-fill=8, secondary-output-io-mode=auto

gstv4l2allocator.c:1398 gst_v4l2_allocator_qbuf:<secondary-recorder-enc:pool:sink:allocator> failed queueing buffer 3: Device or resource busy

ERROR from secondary-recorder-enc: Failed to process frame.

DEBUG: gstv4l2videoenc.c(1950): gst_v4l2_video_enc_handle_frame (): /GstPipeline:secondary-recorder-pipeline/nvv4l2h264enc:secondary-recorder-enc:

Maybe be due to not enough memory or failing driver

Control success log with secondary output-io-mode=mmap:


Starting delayed secondary recorder: source-buffer=8, configured-start-buffer=8, preroll-burst=1, ring-fill=8, secondary-output-io-mode=mmap

EOS from primary-recorder-pipeline (1/2)

EOS from secondary-recorder-pipeline (2/2)

Finished. source-buffers=420 primary-pushed=420 secondary-pushed=413

Requested clarification:

Could NVIDIA confirm whether this is a known change or regression in nvv4l2h264enc or the underlying V4L2 IO-mode handling for delayed secondary encoder startup with NVMM buffers coming from appsrc?

Could NVIDIA also confirm whether this is a separate bug from the previously fixed DeepStream 8.0 NVMM appsrc encoder issue, or an uncovered delayed-secondary-encoder variant of the same IO-mode problem?

If this behavior is expected, what is the recommended high-performance IO-mode configuration for this NVMM appsrc use case? In particular, should output-io-mode=auto be avoided for delayed secondary encoders fed from external NVMM buffers?

Attached artifacts:

  • Standalone reproducer source code.

  • Makefile.

  • Dockerfile.

  • Full GST_DEBUG=v4l2*:6,nvv4l2*:6 logs for the main cases.

    Environment dumps:

  • out/env_x86_ds71.txt

  • out/env_x86_ds90.txt

  • out/thor/env_thor_ds90.txt

nvmm_appsrc_dual_record_enc_repro.zip (47.1 KB)

Hello @maksym.shtembuliak!

Based on the title and content of your topic, it looks like it may receive better visibility and feedback in a different category. We took the liberty of moving it for you.

If this was an incorrect assessment, please send me a direct message.

Disclaimer: this moderation suggestion and message were generated with AI assistance.

Running native on Thor with /opt/nvidia/deepstream/deepstream-9.0. Two nvv4l2h264enc instances did not like sharing NVMM backing memory. Replacing that with a NVMM copy per recorder seems to fix the secondary-recorder fail.

nvmm_appsrc_dual_record_enc_repro.cpp.txt (29.5 KB)

Thanks for checking this.

Your change confirms that the failure is tied to sharing the same NVMM backing memory between encoder instances. However, copying NVMM per recorder changes the memory/IO model and is not a performance-neutral replacement for our production delayed-recording path.

Could NVIDIA confirm whether simultaneous use of the same NVMM backing memory by two nvv4l2h264enc instances is unsupported by design? If yes, is this documented, and what is the recommended high-throughput zero-copy or near-zero-copy pattern for delayed secondary recording?

Also, the original repro works on DeepStream 7.1 and fails on 8.0/9.0 with output-io-mode=auto, while mmap avoids the issue. So we still need clarification whether this is expected behavior, an auto IO-mode selection issue, or a regression in V4L2/DMABUF handling.