Send OpenCV GpuMat to GStreamer pipeline without memory copy?

I have gstreamer pipeline with appsrc and omx encoder in my app. And I have GpuMat after some OpenCV processing.
After reading some topics I found solution with NvBuffer and mmap, but mmap working with CPU memory. So, I tried next steps:

  1. Create NvBuffer
  2. Get fd and call mmap(…)
  3. Create Mat with pointer from mmap(…)
  4. GpuMat.download(Mat)
  5. gst_buffer_new_wrapped_full and some magic with inmem->allocator->mem_type = “nvcam”

It’s working fine, but it still need to copy memory from GPU to CPU.

Also, I playing with NvEGLImageFromFd and mapEGLImage2Float, but cuGraphicsEGLRegisterImage crashed with error code 999. In general, I’m not sure that it can solve my problem, because documentation very poor.

What is the best way to send GpuMat to gstreamer?
Thanks.

This may not be straight forward for your case, but you may have a look to what is used in nvivafilter plugin.
Some info may be found here: https://devtalk.nvidia.com/default/topic/1022543/jetson-tx2/gstreamer-nvmm-lt-gt-opencv-gpumat/post/5208232/#5208232

Hi smarttowel0,
For gstreamer, please try @Honey_Patouceul’s suggestion.

You may also try NvVideoEncoder in MM APIs.

I see code from @Honey_Patouceul’s link and try to write converter from GpuMat to GstBuffer:

GstBuffer *DmaBuffer::toGstBuffer(const cv::cuda::GpuMat &mat)
{
    EGLImageKHR image = NvEGLImageFromFd(m_eglDisplay, m_fd);
    CUresult status;
    CUeglFrame eglFrame;
    CUgraphicsResource pResource = NULL;
    
    cudaFree(0);
    
    status = cuGraphicsEGLRegisterImage(&pResource, image, CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);
    if (status != CUDA_SUCCESS) {
        printf("cuGraphicsEGLRegisterImage failed : %d \n", status);
        return 0;
    }
    
    status = cuGraphicsResourceGetMappedEglFrame(&eglFrame, pResource, 0, 0);
    if (status != CUDA_SUCCESS) {
        printf ("cuGraphicsSubResourceGetMappedArray failed\n");
    }
    
    status = cuCtxSynchronize();
    if (status != CUDA_SUCCESS) {
        printf ("cuCtxSynchronize failed \n");
    }
    
    if (eglFrame.frameType == CU_EGL_FRAME_TYPE_PITCH) {
        if (eglFrame.eglColorFormat == CU_EGL_COLOR_FORMAT_RGBA) {
            cv::cuda::GpuMat mapped(cv::Size(eglFrame.width, eglFrame.height), CV_8UC4,
                                    eglFrame.frame.pPitch[0]);
            mat.copyTo(mapped);
        } else {
            printf ("Invalid eglcolorformat for opencv\n");
        }
    }
    else {
        printf ("Invalid frame type for opencv\n");
    }
    
    status = cuCtxSynchronize();
    if (status != CUDA_SUCCESS) {
        printf ("cuCtxSynchronize failed after memcpy \n");
    }
    
    status = cuGraphicsUnregisterResource(pResource);
    if (status != CUDA_SUCCESS) {
        printf("cuGraphicsEGLUnRegisterResource failed: %d \n", status);
    }
    
    GstBuffer *buffer = gst_buffer_new_wrapped_full(GstMemoryFlags(0),
                                                    m_params.nv_buffer,
                                                    m_params.nv_buffer_size, 0,
                                                    m_params.nv_buffer_size,
                                                    NULL, NULL);
    
    GstMemory *inmem = gst_buffer_peek_memory(buffer, 0);
    inmem->allocator->mem_type = "nvcam";
    
    NvDestroyEGLImage(m_eglDisplay, image);
    
    return buffer;
}

I call this method from appsrc “need-data” signal and it stuck at cuGraphicsEGLRegisterImage call sometimes. It looks like thread synchronization issue, but I can’t understand what I doing wrong. Source code of nvivafilter can help me, but it’s closed source plugin:(

Hi smarttowel0,
Please share your code to use for reproducing the issue. And are you on r28.1?

Yes, I have Jetson TX2 with L4T r28.1
Sorry, I can’t provide full code for reproducing this issue. But I use code from my previous post on each “need-data” signal.
In general, my program have 3 separate threads:

  1. Capture h264 rtsp stream with nvxio
  2. Do some processing with GpuMat
  3. Stream GpuMat over network with rtsp gstreamer (appsrc -> nvvidconv -> omxh264enc -> rtph264pay)

This problem is manifested in two ways:

  1. On first frames my video stream works fine, after that I see that stream drop frames, finally I stuck at cuGraphicsEGLRegisterImage call.
  2. Instead of first way, my program can crash with message:
pthread_mutex_lock.c:349: __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e, __err) != EDEADLK || (kind != PTHREAD_MUTEX_ERRORCHECK_NP && kind != PTHREAD_MUTEX_RECURSIVE_NP)' failed.

Backtrace:

#0  0x0000007fb4933528 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x0000007fb49349e0 in __GI_abort () at abort.c:89
#2  0x0000007fb492cc04 in __assert_fail_base (fmt=0x7fb4a19240 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=assertion@entry=0x7fb7e6b9f8 "INTERNAL_SYSCALL_ERRNO (e, __err) != EDEADLK || (kind != PTHREAD_MUTEX_ERRORCHECK_NP && kind != PTHREAD_MUTEX_RECURSIVE_NP)", file=file@entry=0x7fb7e6bd20 "pthread_mutex_lock.c", line=line@entry=349, 
    function=function@entry=0x7fb7e6bb40 <__PRETTY_FUNCTION__.9092> "__pthread_mutex_lock_full") at assert.c:92
#3  0x0000007fb492ccac in __GI___assert_fail (
    assertion=assertion@entry=0x7fb7e6b9f8 "INTERNAL_SYSCALL_ERRNO (e, __err) != EDEADLK || (kind != PTHREAD_MUTEX_ERRORCHECK_NP && kind != PTHREAD_MUTEX_RECURSIVE_NP)", file=file@entry=0x7fb7e6bd20 "pthread_mutex_lock.c", line=line@entry=349, 
    function=function@entry=0x7fb7e6bb40 <__PRETTY_FUNCTION__.9092> "__pthread_mutex_lock_full") at assert.c:101
#4  0x0000007fb7e616e8 in __pthread_mutex_lock_full (mutex=0xae0dd0) at pthread_mutex_lock.c:347
#5  0x0000007fb7e617fc in __GI___pthread_mutex_lock (mutex=<optimized out>) at pthread_mutex_lock.c:73
#6  0x0000007fb5e3bdbc in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#7  0x0000007fb5e3bdec in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#8  0x0000007fb5d5a108 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#9  0x0000007fb5e8ae38 in cuGraphicsUnregisterResource () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#10 0x0000007fb5e1e838 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#11 0x0000007fb5e1dfa0 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#12 0x0000007f9b2d1c1c in ?? () from /usr/lib/aarch64-linux-gnu/tegra-egl/libEGL_nvidia.so.0
#13 0x0000007f9b2d08c4 in ?? () from /usr/lib/aarch64-linux-gnu/tegra-egl/libEGL_nvidia.so.0
#14 0x0000007f9b2d0d0c in ?? () from /usr/lib/aarch64-linux-gnu/tegra-egl/libEGL_nvidia.so.0
#15 0x0000007f9b2d0fcc in ?? () from /usr/lib/aarch64-linux-gnu/tegra-egl/libEGL_nvidia.so.0
#16 0x0000007f9b2d346c in ?? () from /usr/lib/aarch64-linux-gnu/tegra-egl/libEGL_nvidia.so.0
#17 0x0000007f9b260034 in ?? () from /usr/lib/aarch64-linux-gnu/tegra-egl/libEGL_nvidia.so.0
#18 0x0000007f8cb1ae20 in ?? () from /usr/lib/aarch64-linux-gnu/gstreamer-1.0/libgstnvvideosink.so
#19 0x0000007fa9cf8c6c in ?? () from /usr/lib/aarch64-linux-gnu/libgstbase-1.0.so.0
#20 0x0000007f80158f48 in ?? ()

It’s looks like egl related bug, maybe something about egl calls synchronization?

Hi smarttowel0,
Form current information we cannot reproduce the issue.

If it gets stuck at cuGraphicsEGLRegisterImage(), maybe cuGraphicsUnregisterResource() is not called?

I upload minimal working example to my gdrive https://drive.google.com/file/d/1QHJDhYmg1LNTNkNHC76jVQD9334CMtRW/view?usp=sharing.

I can reproduce this issues with it example. Sometimes it crashed or stucked or worked without problems.

Hi smarttowel0,
Your case runs VisionWorks + OpenCV + gstreamer. Can it be done by VisionWorks + gstreamer or OpenCV + gstreamer? It can be possible to see contraction in running VisioWorks + OpenCV.

Hi smarttowel0,
NvBuffer of MM APIs and video/x-raw(memory:NVMM) of gstreamer are different. We do not support NvBuffer in gstreamer on r28.1.

In some cases usage of OpenCV more comfortable than VX. But I use VX for highly optimizated solutions.

I did not fully understand, can I send GpuMat to gstreamer pipeline via appsrc with minimal overhead? If yes, how it possible? Maybe you can provide code sample?

Thanks

hi smarttowel0,
You cannot send GPuMat to gstreamer pipeline via appsrc.

For a full gstreamer pipeline, you can access video/x-raw(memory:NVMM) in nvivafilter as shown in
https://devtalk.nvidia.com/default/topic/1022543/jetson-tx2/gstreamer-nvmm-lt-gt-opencv-gpumat/post/5204389/#5204389

You can also refer to
https://devtalk.nvidia.com/default/topic/1028387/jetson-tx1/gst-encoding-pipeline-with-frame-processing-using-cuda-and-libargus/post/5232036/#5232036
It demonstrates appsrc(Argus + NvVideoEncoder) -> h264parse -> qtmux -> filesink. You can implement appsrc doing NvBuffer + NvVideoEncoder in your case.