can I fed gstreamer with buffers allocated via nppiMalloc or cudaMalloc?

I am trying to do some performance improvements in my program, that has two gstreamer pipelines at the output.

Can I fed gstreamer with buffers allocated via nppiMalloc or cudaMalloc? Is there any example of it?

Right now I am using zerocopy to fed gstreamer and it works, but I think that adds a lot of memory operations.

Hi,
We suggest you use tegra_multimedia_api + gstreamer.
https://developer.nvidia.com/embedded/dlc/NVIDIA_Tegra_Linux_MultimediaAPIReference
https://developer.nvidia.com/embedded/dlc/l4t-accelerated-gstreamer-guide-32-2

There is a sample patch which runs Argus + NvVideoEncoder + gstreamer. For your case, you may replace Argus with NvBuffers and use below calls to access it through CUDA:

+            EGLImageKHR egl_image = NULL;
+            egl_image = NvEGLImageFromFd(m_eglDisplay, fd);
+            if (egl_image == NULL)
+            {
+                fprintf(stderr, "Error while mapping dmabuf fd (0x%X) to EGLImage\n",
+                         fd);
+            }
+            HandleEGLImage(&egl_image);
+            NvDestroyEGLImage(m_eglDisplay, egl_image);

I see. So keeping gstreamer just for the output (streaming, visualizing…), and doing the compression at a lower level.

Will do that, it should definitely help with performance. Thanks for the advice!

Sorry for bringing back this post.

I started doing the compression with tegra_multimedia_api and using gstreamer for the output. It works great if I feed the encoder input planes with data from cpu.

CUDA->Frame to CPU->NvBuffer->Compression->Gstreamer.

Is there any way to create an NvBuffer directly from a gpu buffer, to avoid the second step?
If I try to use cudaMemCpy to fill the input NvBuffer planes cudaMemcpy returns error 70, which I don’t even see listed.

If I fill it using memcpy instead (with buffers from cpu) it works fine.

Hi,
Working flow is to create NvBuffer and get the CUDA pointer from EGLImage. You can fill in data with the CUDA pointer. The sample code of getting CUDA pointer is in several samples, such as 12_camera_v4l2_cuda.