NVMM and Gstreamer

elektito · January 21, 2019, 3:42pm

I have two questions regarding NVMM memory:

First of all, what is NVMM, exactly, in technical terms? What does copying to/from normal memory to NVMM memory and back involve? I am particularly interested in whether copies involve a bus or are normal memory-to-memory copies. Also, what is its relation to CUDA memory? Are they the same things? If there is a reference describing the internals of TX2 architecture, describing how different subsystems communicate, it would greatly help me in understanding performance issues.

Second, is it possible for me to write a Gstreamer element that outputs directly into NVMM memory? If yes, how so? Is there a sample code available?

DaneLLL · January 22, 2019, 2:23am

Hi,
It is DMA buffers. The DMA buffers can be transferred between HW components.
We have developed tegra_multimedia_api to have NvBuffer. You can access it via APIs defined in nvbuf_utils.h
Please install the samples via Jetpack and refer to
https://developer.nvidia.com/embedded/dlc/l4t-multimedia-api-reference-28-2-ga

elektito · January 22, 2019, 8:48am

By DMA buffers you mean hardware memory mapped into our address space? Or normal memory, made available to hardware for DMA? Is there a bus involved?

I’ll be sure to take a look at those examples. Since you didn’t saying it explicitly, are you confirming that it is possible to write Gstreamer plugins that can work with NVMM memory and can interface with Nvidia elements like nvvidconv, etc?

None of the examples seem related to Gstreamer however. Do you know of any samples about interfacing with Gstreamer?

elektito · January 22, 2019, 10:23am

Another thing I forgot was CUDA. Is there a way to cheaply send data from CUDA to NVMM. My use-case involves some processing best done in CUDA. Copying data from CUDA back to normal memory is rather expensive since they are uncompressed RGB frames. Instead, I’d rather find a way to send them directly to the hardware encoder and copy data back after it’s encoded.

DaneLLL · January 23, 2019, 3:41am

Hi,
In gstreamer, you can access CUDA via nvivafilter. Two posts for your reference:
https://devtalk.nvidia.com/default/topic/963123/jetson-tx1/video-mapping-on-jetson-tx1/post/4979740/#4979740
https://devtalk.nvidia.com/default/topic/978438/jetson-tx1/optimizing-access-to-image-data-acquired-with-nvcamerasrc/post/5026998/#5026998

Please also try tegra_multiemdia_api. You may refer to below sample:

tegra_multimedia_api\samples\03_video_cuda_enc

elektito · January 23, 2019, 8:28am

Let me explain my exact situation, so that you know why not all this has helped me yet. I have a USB 3.0 camera that outputs 4K data at around 30 fps. We have a GStreamer pipeline that is already working with another CSI-2 camera that we intend to replace with the USB camera.

Now the USB camera outputs bayer data, so we need to debayer first. nvivafilter cannot do the trick, since it does not accept bayer format. I tried de-bayering using CUDA first (with help from Nvidia NPP library). That works but not at the frame rate we need. I profiled the code and realized that copying bayer data to CUDA memory takes about 12ms per frame, which is acceptable. But then we need to copy RGB data back to memory which takes around 40ms. This obviously makes it impossible to achieve 30 fps.

So the only logical solution remaining is to get rid of some of those memcpy’s. If, we could somehow pass the CUDA buffers directly to the hardware encoder in the pipeline, this could work. Otherwise, all the copies make the whole thing impossible.

I am looking at the sample code above, though I still haven’t seen anything GStreamer related. I am also looking at the gst-omx and gst-jpeg plugins source code but haven’t been able to find anything yet.

DaneLLL · January 23, 2019, 10:37am

Hi,
The gstreamer implementation may not be able to demonstrate the case. Please try tegra_multimedia_api.

You can create NvBuffer in RGBA and put de-bayered data into the buffer via CUDA, convert it to YUV420 via NvBuffer APIs, and send into NvVideoEncoder to get h264 stream.

HW engines do not support 24 byte RGB format, so you have to use 32 byte RGBA or BGRx.

You may refer to below samples and adapt to your case:

tegra_multimedia_api\samples\12_camera_v4l2_cuda

https://devtalk.nvidia.com/default/topic/1031967/jetson-tx2/tegra_multimedia_api-dq-buffer-from-encoder-output_plane-can-not-completed/post/5251268/#5251268

elektito · January 23, 2019, 4:22pm

This sounds promising, though it inspires more questions!

The addLabels function, called by HandleEGLImage, is where the real CUDA processing happens and needs to be replaced by the functionality I have in mind. Am I right?
If we can indeed create NVMM buffers, isn’t there a way to pass them out from a GStreamer element? That is obviously possible, as nvidia plugins do it, but can’t we also do it? Perhaps by something like memory-mapping the NvBuffer, wrapping it in a GstBuffer and pushing it out?

DaneLLL · January 24, 2019, 3:32am

Yes, the code is at tegra_multimedia_api\samples\common\algorithm\cuda

It is not supported. The solution is to send NvBuffers to NvVideoencoder and get h264 stream. The h264 stream can be wrapped in GstBuffer to send to gstreamer element.

elektito · January 24, 2019, 4:25pm

Looks like a reasonable alternative. I’ll be sure to try it. I’m trying to test it, although I can’t run that sample program directly (no v4l2 compatible camera), so I’m trying to build a derivative program from it that I can run. The cuGraphicsEGLRegisterImage function (called by Handle_EGLImage) is returning CUDA_ERROR_INVALID_VALUE which is not even documented as a possible error. Do you have any idea what that could be about? I’ve checked the arguments in gdb and they seem to have reasonable values.

In any case, I’ll report back here if this works as I think it should.

DaneLLL · January 25, 2019, 1:51am

Hi,
You may connect a USB camera to run 12_camera_v4l2_cuda first. To ensure it runs fine before adaptation.

The call flow is to create NVBuffer first , and then use fd to call NvEGLImageFromFd()

elektito · January 25, 2019, 2:03pm

I manaaged to find a webcam and try the original sample. I believe I found the source of the error: I was using NvBufferColorFormat_XRGB32 as the color format of the output NvBuffer which for some reason doesn’t work. Changing the color format to NvBufferColorFormat_ARGB32 fixed the particular issue. This could have been better documented however, since the docs for cuGraphicsEGLRegisterImage does not even mention that such an error code is a possibility.

Anyway, I haven’t finished my tests, but I believe this can be done as you’ve described. I’m not sure which of your answers I should “accept” since they helped answer my question as a whole. The idea that I can wrap the output of the encoder in a GstBuffer and push it our of my GStreamer element was particularly helpful. Thanks for all the help.

videlo · March 15, 2019, 3:12pm

Hi all,

I am also interested in understanding NVMM as @elektito asked at the beginning of this thread (then it derived on another subject, without really addressing this one).

I am working on capturing from CSI camera with gstreamer and retrieving the frames into an OpenCV app using “appsink”. Below is a very basic pipeline which serve as a prototype for my tests. It is inspired from info found here and there on this very forum.

gst-launch-1.0 nvcamerasrc sensor-id=0 ! 'video/x-raw(memory:NVMM), width=(int)3840, height=(int)2160, format=(string)I420, framerate=(fraction)15/1' ! nvvidconv ! 'video/x-raw, format=(string)BGRx' ! videoconvert ! ‘video/x-raw, format=(string)BGR’ ! appsink

I would like to rekindle what was asked by @elektito ie:

when I transfer data from NVMM to CPU memory (ie. at “nvvidconv”), does it actually perform a copy?
when working into/from NVMM, what bus is involved?
is it hardware memory or memory made available to hardware components?

My objective is to avoid as much mem copy as possible and anticipate future bus/IPS/VI access conflicts between this kind of pipeline and future downstream process such as CUDA.

Thanks in advance for any info/pointers to info you might share!

tejaswinig · March 16, 2019, 11:05am

VIDIOC_S_FMT: failed: Device or resource busy /dev/video0 --stream-to=ov5693.raw
VIDIOC_REQBUFS: failed: Device or resource busy

how to solve…

DaneLLL · March 18, 2019, 2:31am

Hi tejaswinig, please make a new post with more information such as your sensor type or brand id. USB camera, YUV sensor, or Bayer sensor? E-con or Lepoard?

Hi videlo,
We have several posts about OpenCV + gstreamer/tegra_multimedia_api. Please check:
[url]https://devtalk.nvidia.com/default/topic/1024245/jetson-tx2/opencv-3-3-and-integrated-camera-problems-/post/5210735/#5210735[/url]
[url]https://devtalk.nvidia.com/default/topic/1047563/jetson-tx2/libargus-eglstream-to-nvivafilter/post/5319890/#5319890[/url]
[url]https://devtalk.nvidia.com/default/topic/1037863/jetson-tx2/argus-and-opencv/post/5273400/#5273400[/url]
[url]https://devtalk.nvidia.com/default/topic/1047563/jetson-tx2/libargus-eglstream-to-nvivafilter/post/5319890/#5319890[/url]

If your video processing can be in cv::gpuMat, it can be zero memcpy.

elektito · March 18, 2019, 10:03am

Hi videlo,

Let me try to answer your questions with what I have found out so far.

As far as I have seen, transferring buffers from NVMM memory to normal memory always involves a copy, even when copying to CUDA memory. This I surmise from the performance of the operations I’ve observed. Also copying to and from CUDA memory again involves copying so I haven’t seen anything that looks like zero-copy between any pairings of normal memory, NVMM and CUDA memory, at least using the gstreamer API.

NVMM memory is, as answered before by DaneLLL is a set of DMA buffers. As far as I can tell, it’s just normal memory mapped to be usable by hardware encoders, decoders and converters. This should mean that the copies go through the memory bus, no extra overhead involved. Funny thing is, the same is true for CUDA memory (as the TX2 has no dedicated GPU memory) but still copying to and from CUDA memory is more costly than copying normal memory. I haven’t found out why.

All of the above is from experience, so I might be wrong on some points. Take it with a pinch of salt!

In your case, I would suggest taking a look at Tegra Multimedia API and Argus. It would allow you to map NVMM memory and directly access it. I haven’t used it to receive data from camera (my use case involved a non-NVMM-enabled camera and the hardware encoder) but as far as I have seen it is possible. Take a look at the 09_camera_jpeg_capture example in Tegra Multimedia API. It shows how to acquire frames from the camera. After that, you can map the memory with NvBufferMemMap and just use it for whatever purpose you have in mind.

videlo · March 18, 2019, 10:43am

Hi,

DaneLLL,
Thank you for these pointers. I will check them out. I was focusing on a GStreamer acquisition solution, that is why I did not payed too much attention to the tegra multimedia api until now.

elektito,
Thank you very much for these insights!

Is it not curious that we must perform a copy if the memory is just mapped? Should not the memory be shared and accessible by everyone ? (Or is this what you call “mapping NVMM memory and directly accessing it”? I have not checked yet.)

Typically, I would like to retrieve data from “appsink” directly into the NVMM memory. As far as I have tested, this does not work. Using a gstreamer pipeline in my case was motivated by the potential gain in codding time from not using the API. Are you telling me that the API is more flexible/able/complete that the gstreamer nvidia proprietary elements (such as nvcamerasrc, nvvidconv…) alone?

elektito · March 18, 2019, 11:13am

I believe the copy, in case of NVMM, is a byproduct of how nvvidconv works. It’s expected to make a copy from NVMM to normal memory. The NVMM buffers, I assume, cannot be kept indefinitely as there are a limited number of them. The fact that with CUDA memory also involves copies, I cannot explain, but that’s not what you were looking for.

About flexibility, the Multimedia APIs definitely give you more flexibility than the gstreamer APIs. They are not as convenient but you might not have a choice depending on your use case. One bonus point I received from using the MM APIs was that I got a lot more insights about how things work under the hood of the gstreamer elements I had been using.

Topic		Replies	Views
NVMM memory in custom GStreamer plugin Jetson TX1	11	6140	October 18, 2021
NVMM memory Jetson TX1	36	16630	October 18, 2021
NVIDIA Gstreamer nvvidconv question Jetson Xavier NX gstreamer	5	2716	October 18, 2021
Passing NVMM frames to Gstreamer appsink to apply custom processing Jetson AGX Xavier cuda , gstreamer	3	2693	October 18, 2021
Retrieve buffer from gstreamer flow when using video/x-raw(memory:NVMM) Jetson TX2	13	5786	October 18, 2021
What is the meaning of "memory:NVMM"? Jetson Nano gstreamer	6	12880	June 25, 2021
How to pass NVMM buffers on x86 to Gstreamer Jetson Xavier NX gstreamer	5	1110	December 28, 2022
Gstreamer NVMM / NV_BUF references / support Jetson TX1	2	829	April 27, 2017
Explore the gstreamer pipeline with opencv Jetson Nano opencv	16	3859	October 18, 2021
Opencv gpu mat into GStreamer without downloading to cpu Jetson Nano opencv , gstreamer	19	9244	October 13, 2021

NVMM and Gstreamer

Related topics