Performance of opencv capture with hardware accelerated decoding on Jetson-TX2 (Solved)

I am working on optimizing (in terms of CPU and frame throughput) an inference application on Jetson.
The pseudo code for the application is as follows:

vidcap = cv2.VideoCapture("video.mp4");
while ret_val:
	ret_val,image =
	#Some openCV preprocessing
	#tensor_rt inference (similar to imagenet console)

When using cv2.VideoCapture and passing file name as argument, I didn’t see NVDEC string in tegrastats output. Based on this, I inferred that hardware decoding is not running, so thought of doing the capture part through hardware accelerated decoding. Accordingly I changed just the capture line of the code to have

vidcap = cv2.VideoCapture("filesrc location=video.mp4 ! qtdemux name=demux demux.video_0 ! queue ! h264parse ! omxh264dec ! videoconvert ! video/x-raw, format=(string)BGR ! appsink ")

and expected that the CPU load should decrease. Conversely, I saw a increase in CPU load. Graph for cpu load shared at May you please let me know what am I missing

Hi vivekmaran27,
Is your issue about DeepStream SDK? Seems not related to DeepStream SDK.


My understanding is DeepStream SDK is a collection of gstreamer plugins that can be connected as graphs to build application.

I assumed omxh264dec is the part of it. May be I will rephrase my question.

Existing solution:

  1. Captures and decodes frames using OpenCV
  2. OpenCV preprocessing, which will seperate the frame to multiple images each representing a ROI
  3. Inference on each of the images corresponding to the frame (as in imagnet-console) using TensorRT
  4. Consolidate the inference results


vidcap = cv2.VideoCapture(???);
while ret_val:
	ret_val,image =
	#Some openCV preprocessing
	#tensor_rt inference (similar to imagenet console)

What am trying to do
Make the above solution DeepStream based

Problem faced
Since the pipeline involves splitting the frame into multiple images (through OpencV), running inferences on each image, aggregating them back (associating it to a frame), I was finding difficulties in fitting this solution to deepstream example


  1. Any thoughts on the DeepStream pipeline to use?. I have posted details in
  2. Since am already using Tensor-RT for inference. I thought may be if I change the capture part alone using DeepStream gstreamer plugins, and leave the rest of the solution as is, CPU load might reduce. Can you to please tell me the DeepStream pipeline for videocapture alone (I assumed the one I posted in the question was it)

Hi vivekmaran27,
DeepStream SDK is a pure gstreamer implementation and does not have openCV. Please flash your system via Jetpack 3.2.1(3.2 is also good) and extract the SDK:

The HW decoding engine is NVIDIA element omxh264dec. please also refer to gstreamer user guide:

Thanks for your response.
Can you please point me to the location (in the forum) where I can ask the clarification on the omxh264dec element?