Gstreamer pipeline using nvv4l2h264enc to write from shared memory


I have been struggling to find an accelerated gstreamer pipeline that works to write frames from a cv::Mat allocated with cudaMallocHost() to a file.

Currently we are using the following configuration, with some items omitted for brevity:


cv::Mat largeFrame(INPUT_FRAME_HEIGHT,INPUT_FRAME_WIDTH,CV_8UC3,uni_frameLargeAddress);
cv::cuda::GpuMat d_largeFrame(INPUT_FRAME_HEIGHT,INPUT_FRAME_WIDTH,CV_8UC3, uni_frameLargeAddress);

captureString = "nvarguscamerasrc sensor-id=0 ! video/x-raw(memory:NVMM),width="+std::to_string(INPUT_FRAME_WIDTH)+",height="+std::to_string(INPUT_FRAME_HEIGHT)+",framerate=30/1 ! nvvidconv ! video/x-raw,format=BGRx ! videoconvert ! video/x-raw, format=BGR ! appsink";,cv::CAP_GSTREAMER);

cap >> largeFrame; 

std::string gstreamer_pipeline = "appsrc ! videoconvert ! video/x-raw, width=" + std::to_string(largeFrame.cols) + ", height=" + std::to_string(largeFrame.rows) +", format=UYVY, framerate="+std::to_string(INPUT_FPS)+"/1 ! videoconvert ! omxh264enc insert-aud=true !  splitmuxsink  muxer=mpegtsmux location=/mnt/sdcard/videos/temp/%09d.ts max-size-time=2000000000";, cv::CAP_GSTREAMER, -1, 30, cv::Size(largeFrame.cols, largeFrame.rows));

We want to do two things:

  1. replace the egress pipeline with one that can use hardware to do the encoding
  • CPU is taking ~110ms to do the encoding. We would like to be in the sub 30ms range, with 2-4ms being ideal
  • while the path for the image write says “sdcard,” it is actually on an nvme SSD,
  1. see if there is a faster way to load images from the camera into memory
  • Right now the image load takes ~7ms, while our resize operation only takes ~900us.

The camera we are using is based on the imx477, and is connected via CSI

Any help pointing us in the right direction would be greatly appreciated

I wanted to update with an interesting data point: processing a 2MP image (1920 x 1080) to an h264 frame using the above pipeline takes 1.1ms, but processing a 12MP (3040 x 4032) to h264 image takes 110ms. I would have expected that since h264 is roughly n*log(n), I would see a 10x penalty ~ 10-15ms in the encode step. Is there some other factor that could be further degrading performance on the Jetson?

For using cv::cuda::gpuMat without extra memory copy, please refer to this sample:
Nano not using GPU with gstreamer/python. Slow FPS, dropped frames - #8 by DaneLLL

This should eliminate much CPU usage. Please take a look and give it a try.

Hi Dane,

Thanks for the comment, but I’m worried my question wasn’t explained well. I have correctly implemented a shared memory region, referenced by both a cv::Mat and a cv::cuda::GpuMat object instance. I am looking for direction on a gstreamer pipeline that takes this shared memory region and allows me to encode it as an h264 frame to a file.

I provide an example pipeline that works for this task in the second to last line of my code block

std::string gstreamer_pipeline = "appsrc ! videoconvert ! video/x-raw, width=" + std::to_string(largeFrame.cols) + ", height=" + std::to_string(largeFrame.rows) +", format=UYVY, framerate="+std::to_string(INPUT_FPS)+"/1 ! videoconvert ! omxh264enc insert-aud=true !  splitmuxsink  muxer=mpegtsmux location=/mnt/sdcard/videos/temp/%09d.ts max-size-time=2000000000";

I am also seeing very poor performance of this pipeline for a large (12MP / 36MB) cv::Mat, worse than O(n*log(n)) (h264) would suggest.

I would love something like
“appsrc ! video/x-raw(memory:NVMM) ! nvv4l2h264enc ! splitmuxsink muxer=mpegtsmux location=/file/to/be/appended/to.ts” but do not know gstreamer very well.

Do you have any suggestions?

By using NvBuffer APIs, you can get NvBuffer in appsink and send to appsrc. Please check the samples:
[get NvBuffer in appsink]
How to run RTP Camera in deepstream on Nano - #29 by DaneLLL
[send NvBuffer to appsrc]
Creating a GStreamer source that publishes to NVMM - #7 by DaneLLL

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.