Gstreamer AppSrc with Frames in GPU memory space (Caps: video/x-raw(memory:NVMM)

mleonhardt · August 18, 2016, 6:13pm

I have frames in GPU memory and I am trying to push them into Gstreamer for nvvidconv and omxh264enc without copying them to CPU space first. To simplify the discussion, I have created a simple program where the appSrc creates grayscale frames, feeds them to nvvidconv (converts to I420) then omxh264enc, h264parse, qtmux and filesink. If I allocate the frames in CPU space, fill with vales (white, black, etc) everything works correctly.

However, when I set my appSrc with the following caps: “video/x-raw(memory:NVMM), format=(string)GRAY8, width=(int)2048, height=(int)1536, framerate=(fraction)30/1”, then I allocate GPU memory (using cudaMalloc, nppiMalloc_8u_C1, cv::gpu::Mat, etc), fill the memory using nppsSet_8u or cudaMemcpy2D etc then create a GstBuffer (using gst_buffer_new_wrapped_full and the gpu memory pointer), I get a message for every frame: “NVMAP_IOC_WRITE failed: Bad address”

I have increased GST_DEBUG up to include errors and warnings, but do not see any obvious errors. Even if I increase the output to Debug, I cannot get any information about who is sending the error message (likely nvvidconv).

I am basically looking to do the exact opposite of this guy:
https://devtalk.nvidia.com/default/topic/934346/?comment=4910104

I have looked at the NVXIO FrameSource and I see how they can pull NVMM frames out of Gstreamer, but the NVXIO Render doesn’t seem to have an example where frames are pushed into Gstreamer from GPU space (only CPU).

I have no doubt I am missing something obvious about Gstreamer memory access to gpu memory (specifically nvvidconv) or how to create a correct GstBuffer for GPU memory, but I cannot find any information on how to work with NVMM memory. Are there any examples of appSrc that allocates NVMM frames for nvvidconv?

Thanks,

Mark

dusty_nv · August 19, 2016, 3:20am

The inner details of NVMM are not generally released at this time. Since TX1 has fully shared memory between CPU/GPU, how about trying ZeroCopy / mapped memory? See [url]http://arrayfire.com/zero-copy-on-tegra-k1/[/url] for an example. Then just pass the pointer you get from this to your appsrc element like normal. With the unified memory both the CPU/GPU pointers resolve to the same address, so it works to use either on TX1 and doing so should avoid the need for redundant cudaMemcpy’s that you are trying to avoid.

With the method described above, a YUV I420 colorspace conversion kernel like from the following link may allow appsrc chained directly to encoder: [url]https://github.com/dusty-nv/jetson-inference/blob/master/cuda/cudaYUV-YV12.cu[/url]

mleonhardt · August 22, 2016, 3:54pm

Hello Dusty,

Thanks for the feedback, testing with ZeroCopy was on my “todo” list so I will try to get that test in today. I also appreciate the YU12 Cuda source. I was planning on writing my own to speed up the RGB to YV12 conversion, but it looks like you beat me to that one too.

Thanks again!

Mark

michael.devine · April 24, 2018, 5:44pm

I have an OpenVX pipeline, and I’d like to send the output of it, with zero-copy, to gstreamer.

What is the current best approach? (fyi: the link to github above is ‘404’)

Ideally, it would be great if there was a sub-class of nvxio::Render that performed this.

AastaLLL · April 26, 2018, 8:53am

Hi,

Please use our VisionWorks libraries to get the optimized implementation for OpenVX.
From VisionWorks-1.6, camera source is public and you can update it for your use case.

Check here to have the source code ‘[VisionWorks-1.6-Samples]/nvxio/src/NVX/FrameSource/GStreamer’
Thanks.

michael.devine · April 26, 2018, 5:13pm

Thanks! I’ve also been looking at the following in the VisionWorks sample code, and it seems to match well my use case:

‘[VisionWorks-1.6-Samples]/nvxio/src/NVX/Render/GStreamer’

Yesterday I found the following thread which demonstrates how to feed frames to an appsrc node in a gstreamer pipeline. I think, with that approach, all I would need to do is get the frame out of the vx_image object using nvxuCopyImage at the end of my pipeline?

https://devtalk.nvidia.com/default/topic/1024356/jetson-tx2/solved-harware-accelerated-video-encoding-from-c-buffer-with-gstreamer-omxh264enc-and-filesink/post/5211704/#5211704

AastaLLL · April 30, 2018, 7:13am

Hi,

VisionWorks use GStreamer as backend framework to open Camera.
As a result, you don’t need to copy the image data from VisionWorks to GStreamer.

It’s recommended to check the source shared in the comment #5.
Try to apply some modification for your use case to the GStreamer pipeline inside VisonWorks.

Thanks.

michael.devine · April 30, 2018, 4:24pm

Thanks, AastaLLL! But my use case is the exact opposite of those examples. I don’t need to capture FROM gstreamer, as your referenced examples do. Rather, I need to output TO gstreamer.

In other words, I need an appsrc example, not an appsink example.

I have since found that there is no way, from an OpenVX pipeline on the Jetson platform, to use the hardware h.264 or h.265 encoder without first extracting a complete frame.

Instead, it appears I will have to use the NVIDIA non-portable “VPI” a.k.a. “nvxcu::” APIs. The VPI example consistent with my needs is in the ‘[VisionWorks-1.6-Samples]/nvxio/src/NVX/Render/GStreamer’ directory.

Low latency is EXTREMELY important in our product space and another full frame buffer in the pipeline may rule out the platform for our use.

AastaLLL · May 3, 2018, 7:21am

Hi,

There is not available sample for vx_image to appsrc.
It’s recommended to check our opengl_interop sample which converts vx_image into openGL for information.
[i]-----------------------------------

VisionWorks API
Samples and Demos
Sample Applications
OpenGL Interop Sample App
-----------------------------------[/i]

Thanks.

Appletree · April 18, 2019, 8:58pm

Hi Michael and mleonhardt,

I have an identical problem with you guys, more clearly let’s say I have the following data struct and image

struct CudaYuv420pImg {
Npp8u* pImg;    //points to the beginning of the image (also Y plane)
Npp8u* pBuf[3]; //point to the beginning Y,U,V planes
NppiSize sz;
...
}
CudaYuv420pImg I;

and I use

cudaMalloc((void **)&I.pImg, I.sz.w*I.sz.h * 3 / 2)

to do the memory allocation.

My question is that did you guys figure out how to do the zero-copy memory assignment for a GstBuffer pointer such as

GstBuffer *buffer;

from image I?

An example of using this buffer is given in need_data() at test_appsrc.c https://github.com/GStreamer/gst-rtsp-server/blob/master/examples/test-appsrc.c

Thanks,

Topic		Replies	Views
How can I get gpu memory buffer from gstreamer? Jetson Nano cuda , gstreamer	3	1995	October 18, 2021
Creating a GStreamer source that publishes to GPU memory DeepStream SDK	2	798	October 12, 2021
Getting frames from Gstreamer appsink to GpuMat / Mat from NVMM memory Jetson TX1	6	3519	June 23, 2020
NVIDIA Gstreamer nvvidconv question Jetson Xavier NX gstreamer	5	2708	October 18, 2021
Allocate gpu memory and share between gstreamer pipelines Video Processing & Optical Flow cuda , gstreamer , python	0	553	October 4, 2023
NVMM memory in custom GStreamer plugin Jetson TX1	11	6085	October 18, 2021
Creating a GStreamer source that publishes to NVMM Jetson AGX Xavier	12	4951	October 18, 2021
How can I use gstreamer to encode video in cuda memory without copying frames to cpu (host) memory Video Processing & Optical Flow cuda , gstreamer	0	585	August 7, 2023
NVMM and Gstreamer Jetson TX2	18	8698	October 18, 2021
Passing NVMM frames to Gstreamer appsink to apply custom processing Jetson AGX Xavier cuda , gstreamer	3	2673	October 18, 2021

Gstreamer AppSrc with Frames in GPU memory space (Caps: video/x-raw(memory:NVMM)

Related topics