Creating a GStreamer source that publishes to NVMM

BareMetalCoder · February 11, 2020, 8:02pm

Hello,

We have a CUDA-based SDK that produces buffers in GPU memory (via ~~cudaMalloc2D~~ cudaMallocPitch, for example). We would like to hook it up to GStreamer on the Xavier. To do this, I would like to wrap our SDK in a GStreamer source. We also have the possibility of passing these buffers out via an EGL Stream.

I have tried the method described here, and it works:
https://github.com/DaneLLL/gstreamer_eglstreamsrc

However, the method above requires a custom launcher program that creates the EGL Producer and EGL Stream explicitly, thus preventing me from using passing a pure launch string to the stock gst-launch-1.0 program. Furthermore, I’m not very comfortable with the producer code’s delays that ensure that the Producer connects to the stream after the Consumer - feels like a race condition.

(Ultimately, I would like my data to be available via RTSP, and gst-rtsp-server’s default factory requires a pure launch string, or that I create my own RtspMediaFactory subclass).

For this reason, I have decided to try to implement my own GStreamer source. I want to keep my module’s output in GPU memory for efficiency reasons, so I think I need it to support the “memory:NVMM” property (i.e. “video/x-raw(memory:NVMM)”).

I have created a GstBaseSrc subclass, but I don’t know how to implement the _create function for a module that publishes GPU buffers. I am not sure if there is some sort of pre-established standard to follow to implement the “memory:NVMM” capability or what to use to allocate GPU buffers in such a way that downstream GStreamer modules will know that the buffer is in GPU memory. Is it as simple as calling cudaMallocPitch?

Does anyone know of an example of publically-available source code for a GStreamer video source showing how to handle the “memory:NVMM” property correctly and call the appropriate allocation function?

It would be great if we could see how libnveglstreamsrc.so was implemented, for example, but I don’t know if it is publically available.

DaneLLL · February 12, 2020, 2:28am

Hi,
It looks like tegra_multimedia_api can be a good solution of moving to Xavier. We have samples installed in /usr/src/tegra_multimedia_api. Documents are in
https://docs.nvidia.com/jetson/archives/l4t-multimedia-archived/l4t-multimedia-3231/index.html

You can allocate NvBuffer and use following APIs to get EGLImage:

EGLImageKHR NvEGLImageFromFd (EGLDisplay display, int dmabuf_fd);

And get CUDA pointer by calling

status = cuGraphicsEGLRegisterImage(&pResource, image,
            CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);
status = cuGraphicsResourceGetMappedEglFrame(&eglFrame, pResource, 0, 0);

So cudaMalloc2D() should be replaced with NvBufferCreateEx() and call above functions to get CUDA pointer.
The usage is demonstrated in multiple samples. Please take a look and give it a try.

BareMetalCoder · February 12, 2020, 4:17am

Hi Dane,

Thanks Dane,

I may not have explained where I am having trouble correctly. I am rather familiar with the tegra_multimedia_api, but I don’t think it contains an example for the specific issue I am identifying.

I already have working code that converts my SDK’s CUDA output into an EGL image and we then connect it to the nveglstreamsrc gstreamer component as per your eglstreamsrc_example code (and thank you very kindly for that, by the way). I have the CUDA-to-EGLStream process figured out.

My issue is really with the GStreamer code at this point. To implement a video source element, I need to subclass GstBaseSrc or GstPushSrc. To subclass these classes, I need to implement a function that GStreamer calls to create buffers (my_class_create()), but it’s not clear to me what downstream blocks are expecting in terms of a memory format.

I found the public_sources.tbz2 file on this forum this afternoon, and it seems to contain a JPEG decoder that acts as a GStreamer source, but it goes down a series of indirections and I sort of lost track of where the memory actually gets allocated. I’ll continue sifting through that code to see if I can figure anything out.

Ideally, it would be really nice if the source code for libnveglstreamsrc.so or something similar was available somewhere as an example. Do you know if it is publicly available? It would be a great example to follow.

BareMetalCoder · February 12, 2020, 4:23am

Also, my apologies, I mentioned cudaMalloc2D in my initial post, but I meant cudaMallocPitch (now corrected).

Is there any reason why you suggest that I should use NvBufferCreateEx() instead of cudaMallocPitch()? cudaMallocPitch works fine for me at the moment, and I can wrap the resulting PitchedPtr in an EGLImage and submit it to the EGLStream via the CUDA interop functions.

Am I missing out on a better way of doing things?

DaneLLL · February 12, 2020, 9:29am

Hi,
Please note that libnveglstreamsrc.so is not open source.
We have verified the functionality of nveglstreamsrc on r28.2.1. Please refer to following post:
https://devtalk.nvidia.com/default/topic/1044444/jetson-tx1/problem-with-nveglstreamsrc/post/5300639/#5300639

One concern of using nveglstreamsrc is the accuracy of timestamps. When the system is busy, there can be minor drift derived from producer-consumer communication. There is a post discussing about it:
https://devtalk.nvidia.com/default/topic/1042097/jetson-tx2/creating-a-working-nveglstreamsrc-gt-omxh264enc-gstreamer-pipeline/post/5287243/#5287243
If it is not a concern in your usecse, it should be good to run nveglstreamsrc.

BareMetalCoder · February 13, 2020, 2:43pm

Thanks for the warnings regarding the timestamp accuracy. I’ll keep an eye on that.

What I am really looking for is an example of GStreamer source that publishes to video/x-raw(memory:NVMM) so I could learn from it. I didn’t really expect the source code to be open, but I asked just in case because sometimes I miss things.

My problem with nveglstreamsrc is that my producer needs to run in its own thread to feed the EGL stream. To do this, I need to start this thread after the nveglstreamsrc is connected to the stream (I’m sure you already know this, but I’ll mention it for the sake of future readers: the Consumer must connect first, then the Producer)

In your example, this is easy because it controls when the pipeline is started.

Unfortunately, the GStreamer RTSP-server infrastructure defers the creation of the pipeline until a client connects to it. There’s no place where I can intercept the “start stream” event to start the producer as well.

The workaround I am trying right now is to subclass the rtsp_media_factory class. It’s a class that constructs the GStreamer pipeline from a launch string on connection. I extend that behaviour to patch the ‘display’ and ‘eglStream’ properties of ‘egl_src’ as in your example, and then launch the producer thread. The producer thread then queries the stream until it detects that the client has connected.

Still, this is all a hack. It would be much cleaner to write my own GStreamer source that knows how to speak to other (memory:NVMM)-aware elements. Hence my question about sample code of any kind.

DaneLLL · February 14, 2020, 8:11am

Hi,
Here is a sample that demonstrates

appsrc ! video/x-raw(memory:NVMM),format=RGBA,width=1920,height=1080 ! nvvidconv ! nvv4l2h264enc ! h264parse ! matroskamux ! filesink location=a.mkv

Ensure you have installed jetson_multimedia_api(nvbuf_utils) and CUDA through SDKManager and run the build command:

// generate input frame data
$ gst-launch-1.0 videotestsrc num-buffers=150 ! video/x-raw,width=1920,height=1080,format=RGBA ! filesink location=1080.yuv
// build sample
$ CUDA_VER=10.2 make
// run
$ ./appsrc_nvmm
// check a.mkv for the effect

If you integrate it with test-appsrc, you should get a complete RTSP server.
gst-rtsp-server/test-appsrc.c at master · GStreamer/gst-rtsp-server · GitHub

This is another solution but not sure if it is better than your current implementation. FYR.

BareMetalCoder · February 14, 2020, 3:31pm

Thanks Dane.

This looks like an excellent approach for what we need – I had missed appsrc while looking through the documentation. I’ll look into it soon, I just need to put out a fire here first.

Thanks!

BareMetalCoder · February 14, 2020, 7:34pm

Ok, I’m looking at it now and I have a few questions.

If I understand correctly, feed_function() is the function that represents what needs to happen every time our SDK produces a new image.

If that is the case, it would appear that a new NvBufferCreate call is made every time our SDK produces a new image.

How are memory leaks avoided in this situation, i.e. do these buffers get released?
Also, I feel that there is an efficiency concern: is it not costly to allocate a new buffer each time we produce a new image?

DaneLLL · February 18, 2020, 2:26am

Hi,
You can modify the sample to have fixed-number buffers and re-use them. Set the buffer in-use when calling feed_function(), and remove in-use while notify_to_destroy() is called.

BareMetalCoder · February 18, 2020, 12:11pm

Ah I see. Yes this makes more sense. I still have a question regarding this:

Am I correct in understanding that in the RTSP server situation, I would have to defer calling feed_function loop until a client connects? Right now I’m doing this by subclassing the rtsp media factory and modifying it slightly to launch the producer thread (similar to your feed function loop) when the pipeline is created. It’s pretty hacky though: I don’t have a way to be notified when the client disconnects or when the pipeline is destroyed.

DaneLLL · February 19, 2020, 3:34am

Hi,
In running test-appsrc, the server is idle while there is no clients connected. Seems like you need the server always on. We have implementation to enable hardware acceleration in running gstreamer pipelines. Some functions are in gstreamer frameworks and we are lack of experience of using them. Suggest you go to
http://gstreamer-devel.966125.n4.nabble.com/

You may simulate your usecase in

videotestsrc ! x264enc

and see if users in gstreamer forum can give you proper suggestion. Once you have a working pipeline fitting the usecase, it shall work just fine by replacing with

appsrc ! video/x-raw(memory:NVMM),format=RGBA,width=1920,height=1080 ! nvvidconv ! nvv4l2h264enc

Topic		Replies	Views
Creating a working nveglstreamsrc->omxh264enc gstreamer-pipeline Jetson TX2	4	1866	October 18, 2021
sharing memory between CUDA and openmax codec(TX1)? or other fast data transfer? Jetson TX1	15	3865	October 18, 2021
GStreamer warning: Error pushing buffer to GStreamer pipeline Jetson Xavier NX gstreamer	19	2769	October 11, 2023
OpenCV CUDA processing from gstreamer pipeline [JP4, JP5] Jetson AGX Orin opencv , gstreamer	3	1008	November 30, 2023
GStreamer python script to send and receive UDP stream Jetson AGX Xavier gstreamer , python	10	4884	October 18, 2021
Passing NVMM frames to Gstreamer appsink to apply custom processing Jetson AGX Xavier cuda , gstreamer	3	2533	October 18, 2021
RTSP stream not work "Stream format not found, dropping the frame" DeepStream SDK rtsp , gstreamer , nvbugs	24	1648	January 3, 2024
Process multiple Gstreamer buffers directly allocated in CUDA memory Jetson TX2	5	3910	October 18, 2021
Gstdsexample plugin is slow: does GaussianBlur run on GPU? DeepStream SDK opencv , cuda , gstreamer	17	3535	October 12, 2021
Using EGLStream in CUDA application through gstreamer Jetson TX2	6	1882	October 18, 2021

Creating a GStreamer source that publishes to NVMM

Related topics