MMAP samples: How to interconnect the samples provided using

How we connect the samples provided using MMAP?

Per example how we should connect (using MMAP:

00_video_decode  {MMAP}  07_video_convert

I used a FIFO to receive the YUV output of 00_video_decode to send it to 07_video_convert and it works, but not as fast as if we connect the two components over GPU Memory in gStreamer. So my question is how do you interconnect the MMAP samples provided? (CLI and C++)

FYI: I have spent several hours, reviewing the documentation available in PDF, online, and as well inside the SD_Card image folders that contain the samples, yet I cannot find anything that explains this MMAP process. (Rather from the CLI tools perspective nor from the C++ classes perspective.)

My assumption is that the samples provided have a mechanism (MMAP) that will allow us to interconnect directly and efficiently those samples using the GPU pipes. That these interconnections have the capability to use and pass metadata such as TimeStamps and hopefully other important video stuff such as color-space, frame-rate, aspect-ratio, latency, etc. (02_video_encoder surest the existence of --copy_time_stamps)

We suggest use NvBuffer APIs in nvbuf_utils.h. You can call NvBufferTransform() to perform same functions as NvVideoConverter. You can check USE_NVBUF_TRANSFORM_API in 00_video_decode.

Thanks for the suggestion to use another method besides what I’m asking,

However, are you saying that the samples 00_video_decode and 07_video_convert programs cannot be interconnected using “MMAP”? (MMAP as in

The help of those samples, reports having control on the buffer output/capture/V4L2_MEMORY_MMAP: -v4l2-memory-out-plane, -v4l2-memory-cap-plane, --input-nvpl, --input-nvbl, --output-nvpl, --output-nvbl, --input-raw, --output-raw and --copy-timestamp (decoder and encoder programs). Are those functions available not intended or related to map the file output into GPU memory pipes? so the next program can efitienly read directly from those GPU level buffers?


Yes, it can be interconnected. Similar code is in tegra_multimedia_api\samples\backend. However, some users and customers have feedback that it is a bit complex, so we add NvBuffer APIs to use same hardware engine. In the long run, we may deprecate NvVideoConverter( still under discussion ).

There are NvBuffer APIs to access the buffers through CUDA. You may refer to below code:

// Create EGLImage from dmabuf fd
    ctx->egl_image = NvEGLImageFromFd(ctx->egl_display, buffer->planes[0].fd);
    if (ctx->egl_image == NULL)
        fprintf(stderr, "Error while mapping dmabuf fd (0x%X) to EGLImage\n",
        return false;

    // Running algo process with EGLImage via GPU multi cores

    // Destroy EGLImage
    NvDestroyEGLImage(ctx->egl_display, ctx->egl_image);
    ctx->egl_image = NULL;

This is coding a new program. (I know this is an option)

I’m asking if the YUV output of the program 00_video_decode program can send YUV data directly (GPU Level) to the program 07_video_convert using Unix pipes or FIFO, and by using a combination of the porly documented program options -v4l2-memory-out-plane, -v4l2-memory-cap-plane, --input-nvpl, --input-nvbl, --output-nvpl, --output-nvbl, --input-raw, --output-raw.

As mentioned I used a Unix FIFO ($ mkfifo) and 00_video_decode can write to the fifo, and 07_video_convert can read from the fifo, but I don’t believe is using the GPU direct connection (What I assume to be MMAP (V4L2_MEMORY_MMAP) --as in mapping the file output into GPU memory pipes-- ), instead, I believe is sending the entire YUV raw data over the FIFO as I see the CPUs using a lot of computer power.

I hope this time the topic is clear.

We don’t have experience of using Unix FIFO. With some online search, it seems to be an communication between processes. This is not supported.

Below is a related post:
The request has been reviewed and evaluated but as of now is still not supported.

So just as you said, your usecase requires to code a new program.
I know you dislike the comment to suggest you contact us, but I have tried my best to propose possible solutions. If you do require further functions, please consider to become direct customers and have close cooperation with us.

Hi Dane,

Let me clarify my position as a customer As I believe this topic is critically important for NVidia.

When you referred as “We” on your previous eMails, I’m assuming that you have the power of responding on behalf of all nVidia. That I’m talking directly to Nvidia. That the responses I received are the official and definitive answers to our technical questions and concerns.

“Direct Customer” and “Customer” are the same terms. When I paid for your product; I become a direct customer; when Nvidia created a Developer Registering system, where I have to log in, I become “direct”; when Nvidia created this forum and opened to their customers it created a Direct Customer channel. In other words. redirecting our calls to “Sales” makes no sense. To me is just a form of passive aggressiveness that should not take place with the already Direct Customers. (If Nvidia has internally process issues or communication issues, this should be handled internally.)

So to be clear I don’t dislike to contact nVidia, As you can see; from my perspective, I’m in direct contact with nVidia over this direct channel. What I disliked is the lack of proper documentation, the waste of time and energy, and ultimately the chasing I have to do due to the lack of direct responses and accountability of “NVIDIA” to understand, support, resolve and get things done.

Pushing back customers on this channel to get in contact with sales to get “direct” and “closer” with NVidia is insane; more when the sales part was already accomplished. I bought and own a Jetson Nano, and all I need is to make it work to do my own business as soon as possible as Technology is perishable. Nvidia’s Jetson family of products are serious machines, The Nano is not another Rasberry Pie IV, this is not just a hobby. It is on another league and intended for a more serious audience looking to make autonomous machines, the SW on it must be rock solid, as we are talking about “Autonomous” machines.

In addition, contacting Sales to add “new features” or functionality, it should be an unacceptable response, Based on other responses I found in this forum, it is just a simple excuse to deflect the job, to deny the root problem of the lack of SW, support, and knowledge about nVidia owns product; to delay the fact that the SW of this product should be labeled Alpha, and that was released in a premature way.

Dear Dane, I love the Jetson Nano, and yet, I’m kind of disappointed to hear that NVIDIA is not as passionate as some of their customers. Please don’t take this personal, but take a look of the responses made by “NVidia” that has upset already other passionate customers and yet the SW has not progressed as expected for several quarters. For Nvidia the term “3D”, should never mean: “Deny, Delay, Deflect”. and I’m concerned that this incredible new product is going to enter prematurely into Zombie mode by the side effects of mediocrity, ignorance, and indifference, which BTW can be easily fixed by getting proper training, leadership, and the right process of accountability. So please don’t allow this to happen. and lead the way as Sales should not be the answer, as “we have no experience” or “no longer supported”, or “open source” should never be the official’s Nvidia answer. I’m ready to collaborate and make things happen, I live in Silicon Valley, and if you want to meet in person (or any of your leadership), I can meet at Nvidia as I’m ready to follow your lead.

Thanks for choosing Jetson Nano. We will review the request. For more information, is Unix FIFO an inter process communication? 01 and 07 samples have to be run in two processes?


BTW: This is the way I’m expecting to be answered: (If somehow you feel the need to have more time to respond)

“I” don’t have experience on Unix and in particular using a FIFO and MMAP at that level. Let me check with the team what is the intention of all those program arguments you mentioned. (options -v4l2-memory-out-plane, -v4l2-memory-cap-plane, --input-nvpl, --input-nvbl, --output-nvpl, --output-nvbl, --input-raw, --output-raw.)

on Unix. FIFO is like a fake file. you create first the “fake file” using

$mkfifo <virtual filename>

Then you make the first program write to this “file”, while concurrently, you make another program reading from it. This happens at the POSIX level, so no bytes are written into the place where the fifo resides.

From Wikipedia: In computing, mmap(2) is a POSIX-compliant Unix system call that maps files or devices into memory. It is a method of memory-mapped file I/O. It implements demand paging because file contents are not read from disk directly and initially do not use physical RAM at all.

So this sounds like those arguments on 00_video_decode and 07_xxx and 01_xxx ( -v4l2-memory-out-plane, -v4l2-memory-cap-plane, --input-nvpl, --input-nvbl, --output-nvpl, --output-nvbl, --input-raw, --output-raw) are intended to do this kind of pipelining.

The problem is that this and those flags are not properly documented anywhere…

It sounds to me as if you’re just talking sideways across each other.

mmap() is a linux system call that maps some I/O device or file into readable memory for the currently runnable process.
That’s it. That’s all it does.

mmap(), itself, does not allow hardware A to be chained to hardware B.

You could, if you wanted to, mmap() a capture device, and then mmap() an encoding device, and then use memcpy() to transfer the data between buffers from device A, and buffers from device B. This would be how to “use mmap()” to “chain the devices.”

Thus, the answer to the question, as asked: “how do I chain devices using mmap(),” is simply “use memcpy().” You would also have to put the code from both of the samples into a single program, because mmap(), by definition, only establishes a memory mapping for a single process. There is no inherent synchronization between processes in memory segments created through mmap(), and you cannot use mmap() on its own as an inter-process communication mechanism.

This is not a super slow way of doing it, but it’s also not a super fast way of doing it, because copying the data between the buffers is generally an unnecessary operation. Generally, device B can read the buffers output by device A without having to be copied.

The “Use NvBuffer API” suggestion by NVIDIA skips all the way from “mmap() will work but isn’t optimal” all the way to “here is the optimal way to chain devices, and it doesn’t involve user-level mmap() copying.”

Now, if “chain examples by using mmap()” means something else, such as “implement inter-process communication using mmap() and some other synchronization primitive” then that should be part of the question. Because that hasn’t been part of the question, I assume that the answer “copy the code from sample 00 and sample 07 into one program, run them both in mmap() buffer mode, and use memcpy()” is the direct answer to the literal question asked.

I assumed that this was the core function of those MMAP sample programs, to interconnect / map the hardware with the file /IO system offering (yet poorly documented) options such as -v4l2-memory-out-plane, -v4l2-memory-cap-plane, --input-nvpl, --input-nvbl, --output-nvpl, --output-nvbl, --input-raw, --output-raw.

What I’m hearing is that those examples uses internally MMAP, but where not disigned as Unix-POSIX components that can be chained or piped using the file I/O Mapping to obtain a simple yet high-performance system. (kind of what gStreamer those.)

That options such as —copy-timestamp on the decoder sample and encoder sample are also there by accident and not by design suggesting that the programs can be chained.

Is my understanding correct?

The programs are not designed to be chained as-is, that is correct. They are examples showing how to use the NVIDIA APIs. Because there are many ways to access the NVIDIA hardware, the examples have different options, so that you can experiment with the way the code works for each of the options. (I e, the intention is that you should read this code, figure out what parts you want, and then write your own code.)

gstreamer is able to chain modules because they are all loaded into the same host process (typically as loadable shared libraries.)

Got it…

Thanks for the clarification and information.

Many thanks for clarification from @snarky.

Hi Rudoplh,
Below is the software flow of 00_video_decode:

h264 frames -> NvVideoDecoder -> NV12/YUV420s in block linear format -> NvBufferTransform() -> NV12/YUV420s in pitch linear format -> dump to file or render to screen

In your usecase, if you don’t require ‘NV12/YUV420s in pitch linear format’, you can modify the format(for example: RGBA):

h264 frames -> NvVideoDecoder -> NV12/YUV420s in block linear format -> NvBufferTransform() -> <b>RGBAs</b> in pitch linear format -> dump to file or render to screen

If you need NV12/YUV420s, you can call one more NvBufferTransform():

h264 frames -> NvVideoDecoder -> NV12/YUV420s in block linear format -> NvBufferTransform() -> NV12/YUV420s in pitch linear format -> NvBufferTransform() -> RGBAs in pitch linear format -> dump to file or render to screen

Got it. Thanks!

Perdon my ignorance, what means “In pitch” on the context of memory/GPUs?

“pitch” is the distance between each row of pixels in a bitmap.
Or, for planar images, it may be the distance between each plane (it depends on context of where the word is used.)

Got it… is a mapping between the video memory and the GPU memory using raws a the main unit… Kind of what’s happening on video where the video is usually 1088 pixels, but the ‘useful’ region is 1080 vertical lines