How to read video with gstream + opencv + cuda

I wana read video file with GPU not CPU is it avaiable? I have tried too man gstreamer commands but no luck. My best result was this one code shown below that uses mainly CPU then NVDLA + GPU i think i am not sure.
Please help me

import cv2
camera = cv2.VideoCapture(‘filesrc location=test/video6.mp4 ! qtdemux ! h264parse ! omxh264dec ! video/x-raw, format=(string)NV12 ! appsink’, cv2.CAP_GSTREAMER)
#camera = cv2.VideoCapture(‘filesrc location= test/video6.mp4 ! qtdemux name=demux ! h264parse ! omxh264dec ! nvivafilter cuda-process=true ! ‘video/x-raw(memory:NVMM),format=RGBA’ ! nvegltransform ! nveglglessink’, cv2.CAP_GSTREAMER)
#camera = cv2.VideoCapture(‘filesrc location=test/video6.mp4 ! qtdemux name=demux demux.video_0 ! queue ! h264parse ! omxh264dec ! appsink’, cv2.CAP_GSTREAMER)
#camera = cv2.VideoCapture(‘filesrc location=test/video6.mp4 ! qtdemux ! queue ! h264parse ! omxh264dec ! nvvidconv ! video/x-raw,format=BGRx ! queue ! videoconvert ! queue ! video/x-raw, format=BGR ! appsink’, cv2.CAP_GSTREAMER)
#camera = cv2.VideoCapture(‘filesrc location=test/video6.mp4 ! qtdemux ! h264parse ! omxh264dec ! nvvidconv ! video/x-raw, width=1920,height=1080 ! appsink’, cv2.CAP_GSTREAMER)
#camera = cv2.VideoCapture(‘filesrc location=test/video6.mp4 ! qtdemux ! h264parse ! omxh264dec ! nvvidconv ! video/x-raw, format=BGR ! appsink’, cv2.CAP_GSTREAMER)
#v4l2src device=/dev/video%d ! video/x-raw, width=%d, height=%d ! videoconvert ! appsink
rval, frame =
cv2.cvtColor(frame, cv2.COLOR_YUV2BGR_I420)
while rval:
cv2.cvtColor(frame, cv2.COLOR_YUV2BGR_I420);
frame = cv2.cvtColor(frame, cv2.COLOR_YUV2BGR_NV21)
cv2.imshow(“img”, frame)
key = cv2.waitKey(20)
if key == 27: # exit on ESC
rval, frame =


There is no solution to limit CPU usage, the main problem is not the frame color format but the bandwidth used.

When you decode a frame in GPU (omxh264dec) and you pass it to appsink, you basically copy a frame from GPU RAM to CPU RAM. Using Opencv in while loop you are asking to the CPU to manage a frame sequence (like a MJPEG stream) that causes a high CPU use.
In X86 CPU this consume is lower cause the CPU is a lot more powerful.

You could check my explanation your own trying:
gst-launch-1.0 filesrc location=test/video6.mp4 ! qtdemux ! h264parse ! omxh264dec ! “video/x-raw, format=(string)NV12” ! nvoverlaysink display-id=0

With this method frame is not copyed from GPU RAM to CPU RAM and is fully managed by GPU.

To avoid your issue in object detection systems, Nvidia developed Deepstream. A toolkit of libraries in order to execute all calculations in GPU.

Let me explain better, when you detect objects you need to:

  1. Open Video Stream (Gstreamer)
  2. Decode It (Gstreamer)
  3. Detect Objects (Python)
  4. Draw colored boxes over objects (Python Opencv)
  5. Visualize frames or re-stream them to media server like RTSP, RTMP… (Python Opencv)

Using your method, points 3,4,5 should be managed by Python so in CPU (point 3 colud be accelerated by using hardware acceleration).

Well using Deepstream all points will be managed in Gstreamer directly in GPU. Expecially point 4 (draw), it’s very easy to use OpenCV to manage it, but Nvidia developed a gstreamer plugin to do it in GPU:

So the way to reduce CPU usage is not using it…

As opposed to desktop computers with a discrete NVIDIA card using cuvid library for encoding or decoding with GPU, jetsons have dedicated HW processors for video encoding (NVENC) and decoding (NVDEC). You should be able to monitor these from tegrastats. These processors, as ISP or GPU do, operate on NVMM memory that is DMA-able contiguous memory, different from CPU VM memory, although allocated in the same physical memory.

Assuming from your example that you want to read a H264 encoded video from a mp4 container to opencv imshow:

  • reading from file and container depacketization would be done from CPU.
  • decoding the H264 would be done by NVDEC. From gstreamer, this could be done with omxh264dec plugin, that can output into NVMM (gstreamer caps: 'video/x-raw(memory:NVMM)') or standard CPU memory ('video/x-raw'). Note that omx plugins are going deprecated and are replaced by vl4l2 plugins. nvv4l2decoder may have better performance, but it outputs only into NVMM memory, so you would have to use nvvidconv plugin for copying into CPU memory. Decoded output format would be NV12 in recent L4T releases, I420 in old ones.
  • So if you need BGR format into opencv, you have to convert NV12 from NVMM memory into BGR frames in CPU memory for opencv. There are several ways to do this, but the optimal way would be to use nvvidconv for converting NV12 from NVMM memory into BGRx in CPU memory, then use videoconvert plugin on CPU for removing the extra 4th byte from CPU and provide BGR format to opencv. Opencv can also read I420 frames, but conversion with opencv may be less efficient than with nvvidconv+videoconvert (you may use latter’s n-threads option and/or isolate with queue(s)). In short, you would try something like:
camera = cv2.VideoCapture("filesrc location=test/video6.mp4 ! qtdemux ! h264parse ! omxh264dec ! nvvidconv ! video/x-raw, format=BGRx ! videoconvert ! video/x-raw, format=BGR ! appsink", cv2.CAP_GSTREAMER)
  • For high resolution * framerate, opencv_videoio may have limitations, but you would see earlier limits with imshow that may be not so efficient on jetson (may also depend on opencv GUI backend), so you may alternatively try a video writer to a gstreamer displaysink.