Creating a GStreamer source that publishes to NVMM

Hello,

We have a CUDA-based SDK that produces buffers in GPU memory (via cudaMalloc2D cudaMallocPitch, for example). We would like to hook it up to GStreamer on the Xavier. To do this, I would like to wrap our SDK in a GStreamer source. We also have the possibility of passing these buffers out via an EGL Stream.

I have tried the method described here, and it works:
https://github.com/DaneLLL/gstreamer_eglstreamsrc

However, the method above requires a custom launcher program that creates the EGL Producer and EGL Stream explicitly, thus preventing me from using passing a pure launch string to the stock gst-launch-1.0 program. Furthermore, I’m not very comfortable with the producer code’s delays that ensure that the Producer connects to the stream after the Consumer - feels like a race condition.

(Ultimately, I would like my data to be available via RTSP, and gst-rtsp-server’s default factory requires a pure launch string, or that I create my own RtspMediaFactory subclass).

For this reason, I have decided to try to implement my own GStreamer source. I want to keep my module’s output in GPU memory for efficiency reasons, so I think I need it to support the “memory:NVMM” property (i.e. “video/x-raw(memory:NVMM)”).

I have created a GstBaseSrc subclass, but I don’t know how to implement the _create function for a module that publishes GPU buffers. I am not sure if there is some sort of pre-established standard to follow to implement the “memory:NVMM” capability or what to use to allocate GPU buffers in such a way that downstream GStreamer modules will know that the buffer is in GPU memory. Is it as simple as calling cudaMallocPitch?

Does anyone know of an example of publically-available source code for a GStreamer video source showing how to handle the “memory:NVMM” property correctly and call the appropriate allocation function?

It would be great if we could see how libnveglstreamsrc.so was implemented, for example, but I don’t know if it is publically available.

Hi,
It looks like tegra_multimedia_api can be a good solution of moving to Xavier. We have samples installed in /usr/src/tegra_multimedia_api. Documents are in
https://docs.nvidia.com/jetson/archives/l4t-multimedia-archived/l4t-multimedia-3231/index.html

You can allocate NvBuffer and use following APIs to get EGLImage:

EGLImageKHR NvEGLImageFromFd (EGLDisplay display, int dmabuf_fd);

And get CUDA pointer by calling

status = cuGraphicsEGLRegisterImage(&pResource, image,
            CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);
status = cuGraphicsResourceGetMappedEglFrame(&eglFrame, pResource, 0, 0);

So cudaMalloc2D() should be replaced with NvBufferCreateEx() and call above functions to get CUDA pointer.
The usage is demonstrated in multiple samples. Please take a look and give it a try.

Hi Dane,

Thanks Dane,

I may not have explained where I am having trouble correctly. I am rather familiar with the tegra_multimedia_api, but I don’t think it contains an example for the specific issue I am identifying.

I already have working code that converts my SDK’s CUDA output into an EGL image and we then connect it to the nveglstreamsrc gstreamer component as per your eglstreamsrc_example code (and thank you very kindly for that, by the way). I have the CUDA-to-EGLStream process figured out.

My issue is really with the GStreamer code at this point. To implement a video source element, I need to subclass GstBaseSrc or GstPushSrc. To subclass these classes, I need to implement a function that GStreamer calls to create buffers (my_class_create()), but it’s not clear to me what downstream blocks are expecting in terms of a memory format.

I found the public_sources.tbz2 file on this forum this afternoon, and it seems to contain a JPEG decoder that acts as a GStreamer source, but it goes down a series of indirections and I sort of lost track of where the memory actually gets allocated. I’ll continue sifting through that code to see if I can figure anything out.

Ideally, it would be really nice if the source code for libnveglstreamsrc.so or something similar was available somewhere as an example. Do you know if it is publicly available? It would be a great example to follow.

Also, my apologies, I mentioned cudaMalloc2D in my initial post, but I meant cudaMallocPitch (now corrected).

Is there any reason why you suggest that I should use NvBufferCreateEx() instead of cudaMallocPitch()? cudaMallocPitch works fine for me at the moment, and I can wrap the resulting PitchedPtr in an EGLImage and submit it to the EGLStream via the CUDA interop functions.

Am I missing out on a better way of doing things?

Hi,
Please note that libnveglstreamsrc.so is not open source.
We have verified the functionality of nveglstreamsrc on r28.2.1. Please refer to following post:
https://devtalk.nvidia.com/default/topic/1044444/jetson-tx1/problem-with-nveglstreamsrc/post/5300639/#5300639

One concern of using nveglstreamsrc is the accuracy of timestamps. When the system is busy, there can be minor drift derived from producer-consumer communication. There is a post discussing about it:
https://devtalk.nvidia.com/default/topic/1042097/jetson-tx2/creating-a-working-nveglstreamsrc-gt-omxh264enc-gstreamer-pipeline/post/5287243/#5287243
If it is not a concern in your usecse, it should be good to run nveglstreamsrc.

Thanks for the warnings regarding the timestamp accuracy. I’ll keep an eye on that.

What I am really looking for is an example of GStreamer source that publishes to video/x-raw(memory:NVMM) so I could learn from it. I didn’t really expect the source code to be open, but I asked just in case because sometimes I miss things.

My problem with nveglstreamsrc is that my producer needs to run in its own thread to feed the EGL stream. To do this, I need to start this thread after the nveglstreamsrc is connected to the stream (I’m sure you already know this, but I’ll mention it for the sake of future readers: the Consumer must connect first, then the Producer)

In your example, this is easy because it controls when the pipeline is started.

Unfortunately, the GStreamer RTSP-server infrastructure defers the creation of the pipeline until a client connects to it. There’s no place where I can intercept the “start stream” event to start the producer as well.

The workaround I am trying right now is to subclass the rtsp_media_factory class. It’s a class that constructs the GStreamer pipeline from a launch string on connection. I extend that behaviour to patch the ‘display’ and ‘eglStream’ properties of ‘egl_src’ as in your example, and then launch the producer thread. The producer thread then queries the stream until it detects that the client has connected.

Still, this is all a hack. It would be much cleaner to write my own GStreamer source that knows how to speak to other (memory:NVMM)-aware elements. Hence my question about sample code of any kind.

Hi,
Attach a sample that demonstrates

appsrc ! video/x-raw(memory:NVMM),format=RGBA,width=1920,height=1080 ! nvvidconv ! nvoverlaysink

Ensure you have installed tegra_multimedia_api(nvbuf_utils) and CUDA through SDKManager and run the build command:

$ CUDA_VER=10.0 make

If you integrate it with test-appsrc, you should get a complete RTSP server.

This is another solution but not sure if it is better than your current implemention. FYR.
appsrc_nvmm.zip (3.07 KB)

Thanks Dane.

This looks like an excellent approach for what we need – I had missed appsrc while looking through the documentation. I’ll look into it soon, I just need to put out a fire here first.

Thanks!

Ok, I’m looking at it now and I have a few questions.

If I understand correctly, feed_function() is the function that represents what needs to happen every time our SDK produces a new image.

If that is the case, it would appear that a new NvBufferCreate call is made every time our SDK produces a new image.

How are memory leaks avoided in this situation, i.e. do these buffers get released?
Also, I feel that there is an efficiency concern: is it not costly to allocate a new buffer each time we produce a new image?

Hi,
You can modify the sample to have fixed-number buffers and re-use them. Set the buffer in-use when calling feed_function(), and remove in-use while notify_to_destroy() is called.

Ah I see. Yes this makes more sense. I still have a question regarding this:

Am I correct in understanding that in the RTSP server situation, I would have to defer calling feed_function loop until a client connects? Right now I’m doing this by subclassing the rtsp media factory and modifying it slightly to launch the producer thread (similar to your feed function loop) when the pipeline is created. It’s pretty hacky though: I don’t have a way to be notified when the client disconnects or when the pipeline is destroyed.

Hi,
In running test-appsrc, the server is idle while there is no clients connected. Seems like you need the server always on. We have implementation to enable hardware acceleration in running gstreamer pipelines. Some functions are in gstreamer frameworks and we are lack of experience of using them. Suggest you go to
http://gstreamer-devel.966125.n4.nabble.com/

You may simulate your usecase in

videotestsrc ! x264enc

and see if users in gstreamer forum can give you proper suggestion. Once you have a working pipeline fitting the usecase, it shall work just fine by replacing with

appsrc ! video/x-raw(memory:NVMM),format=RGBA,width=1920,height=1080 ! nvvidconv ! nvv4l2h264enc