I have gstreamer pipeline with appsrc and omx encoder in my app. And I have GpuMat after some OpenCV processing.
After reading some topics I found solution with NvBuffer and mmap, but mmap working with CPU memory. So, I tried next steps:
Create NvBuffer
Get fd and call mmap(…)
Create Mat with pointer from mmap(…)
GpuMat.download(Mat)
gst_buffer_new_wrapped_full and some magic with inmem->allocator->mem_type = “nvcam”
It’s working fine, but it still need to copy memory from GPU to CPU.
Also, I playing with NvEGLImageFromFd and mapEGLImage2Float, but cuGraphicsEGLRegisterImage crashed with error code 999. In general, I’m not sure that it can solve my problem, because documentation very poor.
What is the best way to send GpuMat to gstreamer?
Thanks.
I see code from @Honey_Patouceul’s link and try to write converter from GpuMat to GstBuffer:
GstBuffer *DmaBuffer::toGstBuffer(const cv::cuda::GpuMat &mat)
{
EGLImageKHR image = NvEGLImageFromFd(m_eglDisplay, m_fd);
CUresult status;
CUeglFrame eglFrame;
CUgraphicsResource pResource = NULL;
cudaFree(0);
status = cuGraphicsEGLRegisterImage(&pResource, image, CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);
if (status != CUDA_SUCCESS) {
printf("cuGraphicsEGLRegisterImage failed : %d \n", status);
return 0;
}
status = cuGraphicsResourceGetMappedEglFrame(&eglFrame, pResource, 0, 0);
if (status != CUDA_SUCCESS) {
printf ("cuGraphicsSubResourceGetMappedArray failed\n");
}
status = cuCtxSynchronize();
if (status != CUDA_SUCCESS) {
printf ("cuCtxSynchronize failed \n");
}
if (eglFrame.frameType == CU_EGL_FRAME_TYPE_PITCH) {
if (eglFrame.eglColorFormat == CU_EGL_COLOR_FORMAT_RGBA) {
cv::cuda::GpuMat mapped(cv::Size(eglFrame.width, eglFrame.height), CV_8UC4,
eglFrame.frame.pPitch[0]);
mat.copyTo(mapped);
} else {
printf ("Invalid eglcolorformat for opencv\n");
}
}
else {
printf ("Invalid frame type for opencv\n");
}
status = cuCtxSynchronize();
if (status != CUDA_SUCCESS) {
printf ("cuCtxSynchronize failed after memcpy \n");
}
status = cuGraphicsUnregisterResource(pResource);
if (status != CUDA_SUCCESS) {
printf("cuGraphicsEGLUnRegisterResource failed: %d \n", status);
}
GstBuffer *buffer = gst_buffer_new_wrapped_full(GstMemoryFlags(0),
m_params.nv_buffer,
m_params.nv_buffer_size, 0,
m_params.nv_buffer_size,
NULL, NULL);
GstMemory *inmem = gst_buffer_peek_memory(buffer, 0);
inmem->allocator->mem_type = "nvcam";
NvDestroyEGLImage(m_eglDisplay, image);
return buffer;
}
I call this method from appsrc “need-data” signal and it stuck at cuGraphicsEGLRegisterImage call sometimes. It looks like thread synchronization issue, but I can’t understand what I doing wrong. Source code of nvivafilter can help me, but it’s closed source plugin:(
Yes, I have Jetson TX2 with L4T r28.1
Sorry, I can’t provide full code for reproducing this issue. But I use code from my previous post on each “need-data” signal.
In general, my program have 3 separate threads:
Capture h264 rtsp stream with nvxio
Do some processing with GpuMat
Stream GpuMat over network with rtsp gstreamer (appsrc → nvvidconv → omxh264enc → rtph264pay)
This problem is manifested in two ways:
On first frames my video stream works fine, after that I see that stream drop frames, finally I stuck at cuGraphicsEGLRegisterImage call.
Instead of first way, my program can crash with message:
#0 0x0000007fb4933528 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x0000007fb49349e0 in __GI_abort () at abort.c:89
#2 0x0000007fb492cc04 in __assert_fail_base (fmt=0x7fb4a19240 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x7fb7e6b9f8 "INTERNAL_SYSCALL_ERRNO (e, __err) != EDEADLK || (kind != PTHREAD_MUTEX_ERRORCHECK_NP && kind != PTHREAD_MUTEX_RECURSIVE_NP)", file=file@entry=0x7fb7e6bd20 "pthread_mutex_lock.c", line=line@entry=349,
function=function@entry=0x7fb7e6bb40 <__PRETTY_FUNCTION__.9092> "__pthread_mutex_lock_full") at assert.c:92
#3 0x0000007fb492ccac in __GI___assert_fail (
assertion=assertion@entry=0x7fb7e6b9f8 "INTERNAL_SYSCALL_ERRNO (e, __err) != EDEADLK || (kind != PTHREAD_MUTEX_ERRORCHECK_NP && kind != PTHREAD_MUTEX_RECURSIVE_NP)", file=file@entry=0x7fb7e6bd20 "pthread_mutex_lock.c", line=line@entry=349,
function=function@entry=0x7fb7e6bb40 <__PRETTY_FUNCTION__.9092> "__pthread_mutex_lock_full") at assert.c:101
#4 0x0000007fb7e616e8 in __pthread_mutex_lock_full (mutex=0xae0dd0) at pthread_mutex_lock.c:347
#5 0x0000007fb7e617fc in __GI___pthread_mutex_lock (mutex=<optimized out>) at pthread_mutex_lock.c:73
#6 0x0000007fb5e3bdbc in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#7 0x0000007fb5e3bdec in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#8 0x0000007fb5d5a108 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#9 0x0000007fb5e8ae38 in cuGraphicsUnregisterResource () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#10 0x0000007fb5e1e838 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#11 0x0000007fb5e1dfa0 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#12 0x0000007f9b2d1c1c in ?? () from /usr/lib/aarch64-linux-gnu/tegra-egl/libEGL_nvidia.so.0
#13 0x0000007f9b2d08c4 in ?? () from /usr/lib/aarch64-linux-gnu/tegra-egl/libEGL_nvidia.so.0
#14 0x0000007f9b2d0d0c in ?? () from /usr/lib/aarch64-linux-gnu/tegra-egl/libEGL_nvidia.so.0
#15 0x0000007f9b2d0fcc in ?? () from /usr/lib/aarch64-linux-gnu/tegra-egl/libEGL_nvidia.so.0
#16 0x0000007f9b2d346c in ?? () from /usr/lib/aarch64-linux-gnu/tegra-egl/libEGL_nvidia.so.0
#17 0x0000007f9b260034 in ?? () from /usr/lib/aarch64-linux-gnu/tegra-egl/libEGL_nvidia.so.0
#18 0x0000007f8cb1ae20 in ?? () from /usr/lib/aarch64-linux-gnu/gstreamer-1.0/libgstnvvideosink.so
#19 0x0000007fa9cf8c6c in ?? () from /usr/lib/aarch64-linux-gnu/libgstbase-1.0.so.0
#20 0x0000007f80158f48 in ?? ()
It’s looks like egl related bug, maybe something about egl calls synchronization?
Hi smarttowel0,
Your case runs VisionWorks + OpenCV + gstreamer. Can it be done by VisionWorks + gstreamer or OpenCV + gstreamer? It can be possible to see contraction in running VisioWorks + OpenCV.
In some cases usage of OpenCV more comfortable than VX. But I use VX for highly optimizated solutions.
I did not fully understand, can I send GpuMat to gstreamer pipeline via appsrc with minimal overhead? If yes, how it possible? Maybe you can provide code sample?