Copying the data output from the omxh264dec will result in high CPU consumption

I use Nvidia GPU for hardware decoding on TK1, and then i found that copying the data output from the omxh264dec will result in high CPU consumption(overall CPU consumption about 55%) and if display directly, the CPU consumption is low(overall CPU consumption about 10%), so my question is that how to copy the data output from the omxh264dec without incleasing CPU consumption?

Gstreamer pipeline as below:
1.copy the data output from the omxh264dec
gst-launch-1.0 -v rtspsrc user-id=admin user-pw=ms1234 location=rtsp://192.168.200.122/main ! rtph264depay ! h264parse ! omxh264dec ! nvvidconv ! ‘video/x-raw, format=(string)I420’ ! filesink location=./test.yuv

  1. display directly the data output from the omxh264dec
    gst-launch-1.0 -v rtspsrc user-id=admin user-pw=ms1234 location=rtsp://192.168.200.122/main ! rtph264depay ! h264parse ! omxh264dec ! nvhdmioverlaysink -e

Thanks a lot!

Hi Kevin,
The output buffertype of omxh264dec is in video/x-raw(memory:NVMM). It requires memcpy() to make it become video/x-raw.

For rendering out, all buffers are in video/x-raw(memory:NVMM) and zero memcpy(). This is where the difference of CPU consumption comes form.

Hi Dane,
Thanks for your reply.

Will consume so much CPU((overall CPU consumption about 55%)) for making video/x-raw(memory:NVMM) become video/x-raw is normal? Because i am worried that i use the wrong way and there is a way that copying the data output from the omxh264dec consume low CPU consumption?

Thanks a lot!

Hi Kevin,
Here is the result profiled via /home/ubuntu/tegrastats on r21.5
Source: bourne_ultimatum_trailer.zip @ http://www.h264info.com/clips.html

Set CPU to performance mode

echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

gst-launch-1.0 filesrc location=Bourne_trailer.mp4 ! qtdemux ! h264parse ! omxh264dec ! nvvidconv ! ‘video/x-raw, format=(string)I420’ ! filesink location=test.yuv
RAM 513/1892MB (lfb 9x4MB) cpu [98%,off,off,off]@2065 EMC 10%@924 AVP 0%@204 VDE 120 GR3D 0%@72 EDP limit 0
RAM 514/1892MB (lfb 9x4MB) cpu [94%,off,off,off]@2065 EMC 12%@924 AVP 0%@204 VDE 120 GR3D 0%@252 EDP limit 0
RAM 513/1892MB (lfb 9x4MB) cpu [95%,off,off,off]@2065 EMC 12%@924 AVP 0%@204 VDE 120 GR3D 0%@72 EDP limit 0

gst-launch-1.0 filesrc location=Bourne_Trailer.mp4 ! qtdemux ! h264parse ! omxh264dec ! nvhdmioverlaysink
RAM 498/1892MB (lfb 128x4MB) cpu [4%,off,off,off]@2065 EMC 9%@924 AVP 0%@204 VDE 120 GR3D 0%@72 EDP limit 0
RAM 498/1892MB (lfb 128x4MB) cpu [3%,off,off,off]@2065 EMC 9%@924 AVP 0%@204 VDE 276 GR3D 0%@72 EDP limit 0
RAM 498/1892MB (lfb 128x4MB) cpu [4%,off,off,off]@2065 EMC 9%@924 AVP 0%@204 VDE 444 GR3D 0%@72 EDP limit 0

Mostly dumping buffer out is for debugging. If you have a real usecase of dumping buffer to file, inevitably it suffers the high CPU consumption.

Hi Dane,
Thanks for your help.

Setting CPU to performance mode is very helpful, after doing this, the CPU consumption from 55% to 20%, so i think this is the root cause of high CPU consumption.

PS: In fact my purpose is not to save as file but callback HW decoding data

Hello DaneLLL!Sorry to bother you.I want to know if I set CPU to performance mode,is there any problem when I run my qt program on the TX1?