Gstreamer AppSrc with Frames in GPU memory space (Caps: video/x-raw(memory:NVMM)

I have frames in GPU memory and I am trying to push them into Gstreamer for nvvidconv and omxh264enc without copying them to CPU space first. To simplify the discussion, I have created a simple program where the appSrc creates grayscale frames, feeds them to nvvidconv (converts to I420) then omxh264enc, h264parse, qtmux and filesink. If I allocate the frames in CPU space, fill with vales (white, black, etc) everything works correctly.

However, when I set my appSrc with the following caps: “video/x-raw(memory:NVMM), format=(string)GRAY8, width=(int)2048, height=(int)1536, framerate=(fraction)30/1”, then I allocate GPU memory (using cudaMalloc, nppiMalloc_8u_C1, cv::gpu::Mat, etc), fill the memory using nppsSet_8u or cudaMemcpy2D etc then create a GstBuffer (using gst_buffer_new_wrapped_full and the gpu memory pointer), I get a message for every frame: “NVMAP_IOC_WRITE failed: Bad address”

I have increased GST_DEBUG up to include errors and warnings, but do not see any obvious errors. Even if I increase the output to Debug, I cannot get any information about who is sending the error message (likely nvvidconv).

I am basically looking to do the exact opposite of this guy:
https://devtalk.nvidia.com/default/topic/934346/?comment=4910104

I have looked at the NVXIO FrameSource and I see how they can pull NVMM frames out of Gstreamer, but the NVXIO Render doesn’t seem to have an example where frames are pushed into Gstreamer from GPU space (only CPU).

I have no doubt I am missing something obvious about Gstreamer memory access to gpu memory (specifically nvvidconv) or how to create a correct GstBuffer for GPU memory, but I cannot find any information on how to work with NVMM memory. Are there any examples of appSrc that allocates NVMM frames for nvvidconv?

Thanks,

  • Mark

The inner details of NVMM are not generally released at this time. Since TX1 has fully shared memory between CPU/GPU, how about trying ZeroCopy / mapped memory? See http://arrayfire.com/zero-copy-on-tegra-k1/ for an example. Then just pass the pointer you get from this to your appsrc element like normal. With the unified memory both the CPU/GPU pointers resolve to the same address, so it works to use either on TX1 and doing so should avoid the need for redundant cudaMemcpy’s that you are trying to avoid.

With the method described above, a YUV I420 colorspace conversion kernel like from the following link may allow appsrc chained directly to encoder: https://github.com/dusty-nv/jetson-inference/blob/master/cuda/cudaYUV-YV12.cu

Hello Dusty,

Thanks for the feedback, testing with ZeroCopy was on my “todo” list so I will try to get that test in today. I also appreciate the YU12 Cuda source. I was planning on writing my own to speed up the RGB to YV12 conversion, but it looks like you beat me to that one too.

Thanks again!

  • Mark

I have an OpenVX pipeline, and I’d like to send the output of it, with zero-copy, to gstreamer.

What is the current best approach? (fyi: the link to github above is ‘404’)

Ideally, it would be great if there was a sub-class of nvxio::Render that performed this.

Hi,

Please use our VisionWorks libraries to get the optimized implementation for OpenVX.
From VisionWorks-1.6, camera source is public and you can update it for your use case.

Check here to have the source code ‘[VisionWorks-1.6-Samples]/nvxio/src/NVX/FrameSource/GStreamer’
Thanks.

Thanks! I’ve also been looking at the following in the VisionWorks sample code, and it seems to match well my use case:

‘[VisionWorks-1.6-Samples]/nvxio/src/NVX/Render/GStreamer’

Yesterday I found the following thread which demonstrates how to feed frames to an appsrc node in a gstreamer pipeline. I think, with that approach, all I would need to do is get the frame out of the vx_image object using nvxuCopyImage at the end of my pipeline?

https://devtalk.nvidia.com/default/topic/1024356/jetson-tx2/solved-harware-accelerated-video-encoding-from-c-buffer-with-gstreamer-omxh264enc-and-filesink/post/5211704/#5211704

Hi,

VisionWorks use GStreamer as backend framework to open Camera.
As a result, you don’t need to copy the image data from VisionWorks to GStreamer.

It’s recommended to check the source shared in the comment #5.
Try to apply some modification for your use case to the GStreamer pipeline inside VisonWorks.

Thanks.

Thanks, AastaLLL! But my use case is the exact opposite of those examples. I don’t need to capture FROM gstreamer, as your referenced examples do. Rather, I need to output TO gstreamer.

In other words, I need an appsrc example, not an appsink example.

I have since found that there is no way, from an OpenVX pipeline on the Jetson platform, to use the hardware h.264 or h.265 encoder without first extracting a complete frame.

Instead, it appears I will have to use the NVIDIA non-portable “VPI” a.k.a. “nvxcu::” APIs. The VPI example consistent with my needs is in the ‘[VisionWorks-1.6-Samples]/nvxio/src/NVX/Render/GStreamer’ directory.

Low latency is EXTREMELY important in our product space and another full frame buffer in the pipeline may rule out the platform for our use.

Hi,

There is not available sample for vx_image to appsrc.
It’s recommended to check our opengl_interop sample which converts vx_image into openGL for information.
[i]-----------------------------------

VisionWorks API
Samples and Demos
Sample Applications
OpenGL Interop Sample App
-----------------------------------[/i]

Thanks.

Hi Michael and mleonhardt,

I have an identical problem with you guys, more clearly let’s say I have the following data struct and image

struct CudaYuv420pImg {
Npp8u* pImg;    //points to the beginning of the image (also Y plane)
Npp8u* pBuf[3]; //point to the beginning Y,U,V planes
NppiSize sz;
...
}
CudaYuv420pImg I;

and I use

cudaMalloc((void **)&I.pImg, I.sz.w*I.sz.h * 3 / 2)

to do the memory allocation.

My question is that did you guys figure out how to do the zero-copy memory assignment for a GstBuffer pointer such as

GstBuffer *buffer;

from image I?

An example of using this buffer is given in need_data() at test_appsrc.c https://github.com/GStreamer/gst-rtsp-server/blob/master/examples/test-appsrc.c

Thanks,