Hello NVIDIA Jetson Community,
We are evaluating Jetson AGX Orin for a low-latency video encoding (H.264 \ H.265) application and have a specific question about the NVENC hardware encoder’s capabilities.
Use Case
We have an FPGA connected to Jetson AGX Orin via PCIe that captures and streams video data. The FPGA delivers pixel data line-by-line (scanline-by-scanline) over PCIe — meaning the complete frame is not immediately available on the Orin side but arrives progressively as the sensor reads out.
Minimizing end-to-end encode latency is critical. Ideally, we would like the NVENC encoder to begin encoding as soon as enough scanlines (e.g., a CTU row) have been DMA’d from the FPGA into Orin’s memory, rather than waiting for the entire frame to be fully transferred and assembled.
Architecture Overview
Collapse
Copy
1
[Sensor] → [FPGA] ------ PCIe DMA (line-by-line) -------> [Jetson AGX Orin memory] → [NVENC] → [encoded bitstream]
The FPGA transfers scanlines (or groups of scanlines) into Orin’s DRAM via PCIe DMA as they become available. Currently, we must wait for the full frame to be assembled in a buffer before submitting it to the encoder, which adds up to one full frame period of latency.
Questions
-
Sub-frame input support: Does the NVENC hardware block on Jetson AGX Orin support any mode where encoding can begin on partial frame data (e.g., a set of scanlines, CTU rows, or slices) before the entire frame is available in the buffer?
-
API-level support: Is there any mechanism — via V4L2, NvMedia, or another low-level API — for feeding scanlines or groups of lines incrementally to the encoder, so it can start processing CTU/macroblock rows in parallel with the ongoing PCIe DMA transfer from the FPGA?
-
Frame-based confirmation: Or is the encoder strictly frame-based, requiring a fully populated image buffer to be queued (e.g., via
V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE) before encoding begins? -
PCIe DMA considerations: Are there any recommended DMA buffer layouts or memory allocation strategies (e.g., NvBuf, GBM, dma-buf) that could help minimize latency when ingesting video data from an FPGA over PCIe and passing it to NVENC?
-
Alternative approaches: If sub-frame encoding is not supported, what is the recommended approach to minimize encode latency in this pipeline? For example:
-
Encoding smaller “pseudo-frames” (horizontal strips) independently?
-
Using slice-based encoding with one slice per CTU row?
-
Any NVIDIA-recommended low-latency pipeline architecture for progressive PCIe input?
-
-
Future support: Is sub-frame or line-level encoding planned for any future JetPack / L4T release?
What We Have Already Reviewed
-
Jetson Linux Multimedia API documentation
-
V4L2 video encoder samples (
01_video_encode) -
NVIDIA Video Codec SDK documentation
-
GStreamer encoder plugin documentation (
nvv4l2h264enc/nvv4l2h265enc) -
PCIe endpoint and DMA documentation for Jetson AGX Orin
We did not find any references to sub-frame or line-level encoding in these resources, but wanted to confirm whether this capability exists at the hardware level or is accessible through a lower-level interface.
Any insights, documentation references, or suggested workarounds would be greatly appreciated. We are open to using low-level APIs if they enable this capability. Thank you in advance!
Motti