How can I get frames efficiently from h.264 video (through filesrc or http) for cv2

Hi, I plan to make a objection detection application with jetson orin nano 8gb with rtsp camera. Currently I haven’t got a rtsp camera so I try with a h.264 mp4 video. I try to get frames for dl and machine vision analysis, so I haven’t try DeepStream.

My pipline looks like:
gst-launch-1.0 filesrc location=“/home/mic-711on/trim2.mp4” ! qtdemux ! queue ! h264parse ! nvv4l2decoder ! nvvidconv ! ‘video/x-raw(memory:NVMM), format=(string)BGRx’ ! nvvidconv ! ‘video/x-raw’ !
videoconvert! ‘video/x-raw, format=(string)BGR’ ! appsink emit-signals=True


Here is the Callback func definition:

def gst_to_opencv(sample):
    buf = sample.get_buffer()
    caps = sample.get_caps()


    arr = numpy.ndarray(
        buffer=buf.extract_dup(0, buf.get_size()),
    return arr

def new_buffer(sink, data):
    global image_arr
    sample = sink.emit("pull-sample")
    # buf = sample.get_buffer()
    # print "Timestamp: ", buf.pts
    arr = gst_to_opencv(sample)
    image_arr = arr
    return Gst.FlowReturn.OK

Now the cv2.imshow run properly, and then I would try to run a yolov5-based application. This pipeline works, but it looks ugly and I assume that it would waste a lot for using nvvidconv twice and the videoconvert. May I beg a solution of efficiency?


Please try our Deepstream SDK.
Below is a useful GitHub from the community for your reference:


Hi, AastaLLL
I am trying different algorithm now, and I think it is not necessary to use yolo. My plan is to figure out how to construct a AI project with common methods, I may try Deepstream later. I perfer a solution based on gstreamer, cause I am really a fresh man on coding with video stream and DL, please don’t mind.

DeepSteam SDK is based on gstreamer. Please check document for more information:
NVIDIA Metropolis Documentation

You can install it through SDKManager. After the installation, the package is in


Please try deepstream-app

