• Hardware Platform (Jetson / GPU)
Jetson AGX Thor (ARM64) and NVIDIA GeForce RTX 4060 Laptop GPU (both reproduce on DeepStream 8.0/9.0)
• DeepStream Version
Reproduced:
DeepStream 9.0 (nvcr.io/nvidia/deepstream:9.0-triton-multiarch)
Comparison baseline:
DeepStream 7.1 (nvcr.io/nvidia/deepstream:7.1-triton-multiarch) on x86, where the issue is not reproduced.
Additional Jetson check:
DeepStream 8.0 (nvcr.io/nvidia/deepstream:8.0-triton-multiarch) on Jetson Thor also reproduces the issue.
• JetPack Version (valid for Jetson only)
Jetson Linux R38.2.2
/etc/nv_tegra_release:
# R38 (release), REVISION: 2.2, GCID: 42205042, BOARD: generic, EABI: aarch64, DATE: Thu Sep 25 22:47:11 UTC 2025
• TensorRT Version
DeepStream container bundled TensorRT version.
• NVIDIA GPU Driver Version (valid for GPU only)
x86 + dGPU: 590.48.01
Jetson Thor: 580.00
• Issue Type (questions, new requirements, bugs)
Bug
• Relation to previous NVIDIA forum issue
This issue appears related to the previously reported DeepStream 8.0 NVMM appsrc encoder bug:
NVIDIA support confirmed that previous issue and later stated that it was fixed in DeepStream 9.0.
The current reproducer is not the same trigger. The previous issue covered a single encoder fed from external NVMM buffers. This new issue covers the same NVMM appsrc encoder layer, but with a delayed secondary encoder started while a primary encoder is already running.
So the DeepStream 9.0 fix appears to cover the original single-encoder case, but the delayed secondary encoder case still fails with output-io-mode=auto.
• Impact
This is a production-impacting regression for pipelines that rely on NVIDIA hardware encoding and NVMM buffers for maximum throughput.
The failing path is not project-specific: the minimal repro is only appsrc -> queue -> nvv4l2h264enc -> fakesink, with NVMM buffers coming from a DeepStream-owned pool.
The mmap setting is useful as a diagnostic workaround, but it is not a performance-neutral replacement for output-io-mode=auto in a high-throughput multi-session system. It changes the encoder IO path and can affect GPU/CPU memory movement, latency, and session density. We need to understand whether auto selecting the failing path is expected behavior, a DeepStream regression, or a driver/V4L2 issue.
• How to reproduce the issue?
We prepared a minimal standalone reproducer (nvmm_appsrc_dual_record_enc_repro.cpp + Makefile) with no project/business logic, no input file, no parser, no muxer, and no output video file.
The reproducer creates two independent encoder branches and validates only the GStreamer bus result.
Source flow:
videotestsrc is-live=true
-> nvvideoconvert
-> video/x-raw(memory:NVMM),format=NV12
-> appsink
-> external DeepStream NVMM pool (gst_nvds_buffer_pool_new)
Each encoder branch:
appsrc
-> queue
-> nvv4l2h264enc
-> fakesink
Important behavior:
-
Primary encoder starts immediately with
output-io-mode=auto. -
Secondary encoder starts later at source buffer 8.
-
Secondary encoder receives one already-captured NVMM buffer from the ring.
-
Default repro thresholds are intentionally small:
-
ring-size=8 -
secondary-start-buffer=8 -
secondary-preroll-buffers=1
Run command (DeepStream 9.0, failure case):
docker build --build-arg DS_VERSION=9.0 -t nvmm-appsrc-dual-enc-repro:9.0 .
docker run --rm --gpus all \
-e NVIDIA_DRIVER_CAPABILITIES=compute,video,utility \
-e GST_DEBUG="v4l2*:6,nvv4l2*:6" \
nvmm-appsrc-dual-enc-repro:9.0
Run command (DeepStream 9.0, workaround/control):
docker run --rm --gpus all \
-e NVIDIA_DRIVER_CAPABILITIES=compute,video,utility \
-e GST_DEBUG="v4l2*:6,nvv4l2*:6" \
nvmm-appsrc-dual-enc-repro:9.0 \
--secondary-output-io-mode mmap
Run command (DeepStream 7.1, x86 comparison baseline):
docker build --build-arg DS_VERSION=7.1 -t nvmm-appsrc-dual-enc-repro:7.1 .
docker run --rm --gpus all \
-e NVIDIA_DRIVER_CAPABILITIES=compute,video,utility \
-e GST_DEBUG="v4l2*:6,nvv4l2*:6" \
nvmm-appsrc-dual-enc-repro:7.1
On Jetson Thor, DeepStream libraries such as libnvbufsurface.so can be symlinks to tegra driver libraries mounted only by NVIDIA runtime. If docker build cannot link the reproducer on Jetson, build and run it inside the runtime container:
docker run --rm --runtime nvidia --gpus all \
--entrypoint bash \
-e NVIDIA_DRIVER_CAPABILITIES=compute,video,utility \
-e GST_DEBUG="v4l2*:6,nvv4l2*:6" \
-v "$PWD":/work \
-w /work \
nvcr.io/nvidia/deepstream:9.0-triton-multiarch \
-lc 'make clean all DS_SDK_ROOT=/opt/nvidia/deepstream/deepstream && ./nvmm_appsrc_dual_record_enc_repro'
Observed behavior:
-
x86 + DS 7.1 +
auto: succeeds. -
x86 + DS 9.0 +
auto: fails. -
x86 + DS 9.0 + secondary
mmap: succeeds. -
Jetson Thor + DS 8.0 +
auto: fails. -
Jetson Thor + DS 9.0 +
auto: fails. -
Jetson Thor + DS 9.0 + secondary
mmap: succeeds.
Typical failure log:
Starting delayed secondary recorder: source-buffer=8, configured-start-buffer=8, preroll-burst=1, ring-fill=8, secondary-output-io-mode=auto
gstv4l2allocator.c:1398 gst_v4l2_allocator_qbuf:<secondary-recorder-enc:pool:sink:allocator> failed queueing buffer 3: Device or resource busy
ERROR from secondary-recorder-enc: Failed to process frame.
DEBUG: gstv4l2videoenc.c(1950): gst_v4l2_video_enc_handle_frame (): /GstPipeline:secondary-recorder-pipeline/nvv4l2h264enc:secondary-recorder-enc:
Maybe be due to not enough memory or failing driver
Control success log with secondary output-io-mode=mmap:
Starting delayed secondary recorder: source-buffer=8, configured-start-buffer=8, preroll-burst=1, ring-fill=8, secondary-output-io-mode=mmap
EOS from primary-recorder-pipeline (1/2)
EOS from secondary-recorder-pipeline (2/2)
Finished. source-buffers=420 primary-pushed=420 secondary-pushed=413
Requested clarification:
Could NVIDIA confirm whether this is a known change or regression in nvv4l2h264enc or the underlying V4L2 IO-mode handling for delayed secondary encoder startup with NVMM buffers coming from appsrc?
Could NVIDIA also confirm whether this is a separate bug from the previously fixed DeepStream 8.0 NVMM appsrc encoder issue, or an uncovered delayed-secondary-encoder variant of the same IO-mode problem?
If this behavior is expected, what is the recommended high-performance IO-mode configuration for this NVMM appsrc use case? In particular, should output-io-mode=auto be avoided for delayed secondary encoders fed from external NVMM buffers?
Attached artifacts:
-
Standalone reproducer source code.
-
Makefile. -
Dockerfile. -
Full
GST_DEBUG=v4l2*:6,nvv4l2*:6logs for the main cases.Environment dumps:
-
out/env_x86_ds71.txt -
out/env_x86_ds90.txt -
out/thor/env_thor_ds90.txt
nvmm_appsrc_dual_record_enc_repro.zip (47.1 KB)