H264 decode performance issues

I have been playing with Dusty’s tutorial code (from github) on my new Xavier NX, but I’m running into some performance issues.

Essentially I’m using this code:

import jetson.inference
import jetson.utils

net = jetson.inference.detectNet("ssd-mobilenet-v2", threshold=0.5)
camera = jetson.utils.videoSource("csi://0")      # '/dev/video0' for V4L2
display = jetson.utils.videoOutput("display://0") # 'my_video.mp4' for file

while display.IsStreaming():
	img = camera.Capture()
	detections = net.Detect(img)
	display.SetStatus("Object Detection | Network {:.0f} FPS".format(net.GetNetworkFPS()))

With an h264 encoded video as input and display://0 as output. The video is 1920x1080 pixels and 2000kbps bitrate, 25fps, no audio. SMPlayer plays this video easily with plenty CPU/GPU power to spare. But when I run object detection on it, the video clearly stutters. Whereas the FPS in the statusbar varies somewhere between 80 and even 120fps which seems to indicate the Jetson could theoretically be able to do 4 or even 5 of these streams at once. But it already stutters with one? I would like to have smooth playback and I assume the device should be able to. What am I doing wrong here?

It can’t be an I/O bottleneck I think, because the video is loaded from an NVMe SSD.

I might be on to something here. I’ve noticed that, even though the input video plays slowly/stutters, the CPU/GPU has ample overhead. It’s not a performance issue, I think.

Also, I found this in the stdout:

[gstreamer] gstEncoder -- new caps: video/x-raw, width=1920, height=1080, format=(string)I420, framerate=30/1
Framerate set to : 30 at NvxVideoEncoderSetParameterNvMMLiteOpen : Block : BlockType = 4 

Which is odd, because the input video certainly is 25fps. The output also produces a 16fps video somehow.

Is there a way to force the input framerate? I found some options here:

To have some control over dimensions, bitrate, etc. but no framerate?

Can I issue a small bump? I’m still looking for an answer/explanation why video plays (en records) stuttery with a low framerate while there is plenty CPU and GPU power left and the netwerk reports 100+ FPS? Is this a known issue?

I’m playing/streaming a 1920x1080x25fps h264 video with fairly low bitrate so that can’t be the issue I suppose? (I hope?)

(network and video fps in statusbar/window title)

Please try the package in DeepStream SDK:


This is the solution based on DeepStream SDK and should provide better performance. Please give it a try.

More information about DeepStream SDK:
Announcing DeepStream 5.0.1

1 Like

Thank you! I’ll give it a try soon.

I’ve read the Deepstream documentation mostly, and I have seen some of the example implementations. Isn’t it true that Dusty’s “jetson-inference” and “jetson-utils” libraries are based on the same tech?

It seems such a waste of time and energy to re-implement my entire videosource/inferencing loop. I assume Dusty’s implementation should be more than accurate.

Isn’t there some easier way to influence construction of the gst pipeline? I think I may have found the culprit, but correct me if I’m wrong. I just create a videosource through:

videoInput = jetson.utils.videoSource(inputStream)

Where the input stream is an h264 encoded mp4 file. The library does the work (I suppose) of analyzing that file and constructing a suitable pipeline. It outputs the following:

[gstreamer] filesrc location=Samplevideos/DSCF1061.mp4 ! qtdemux ! queue ! h264parse ! omxh264dec ! video/x-raw ! appsink name=mysink

I read in the Deepstream documentation that the pipeline should be constructed with “nvv4l2decoder” or “nvdec_h264” decoder to be able to use hardware accelerated decode. Then why is it using “omxh264dec”? Or is this totally fine?

Is there a (somewhat easy) way for me to influence the pipeline without implementing the entire thing myself? (with the help of deepstream/gst)

Hi @willemvdkletersteeg, omxh264dec is also a HW-accelerated decoder plugin on Jetson, nvv4l2decoder is just newer. You can try changing the pipeline around under jetson-inference/utils/codec/gstDecoder.cpp:


If you change anything in there, re-run make and sudo make install again. You can also scale down the input with the --input-width and --input-height command-line arguments. It all gets scaled down to 300x300 anyways to the detection network.

As DaneLLL pointed out, DeepStream is more optimized for higher-bandwidth multimedia applications. I’m not sure if the way I’m capturing the data with appsrc is the culprit or not.