high CPU usage using omxh264enc on TK1

Hi guy,

i have a test with exact same statement as you mentioned on TK1, and notice that the CPU usage is about 70%, instead of 5%, so is there anything I missed on TK1 to use omxh264enc? this is a big issue for us, due to high CPU consuming, because we are trying to use omxh264enc in our application (with appsrc and appsink) with 150% CPU usage, that is not accepted!

thanks for any suggestion!

refer to:
https://devtalk.nvidia.com/default/topic/937850/jetson-tk1/gpu-accelerated-gstreamer-filters-/?offset=5#5023167
“For example, this command does video decoding + encoding purely in hardware (~60 FPS for 1080p with just 5% CPU usage):”

gst-launch-1.0 -e filesrc location=in_1080p25.h264 ! h264parse ! omxh264dec ! queue ! nvvidconv ! 'video/x-raw(memory:NVMM),format=(string)I420' ! omxh264enc bitrate=45000000 insert-sps-pps=true ! 'video/x-h264, stream-format=(string)byte-stream, profile=high' ! h264parse ! filesink location=out.h264

Best Regards.
-zhi

Hi iamsyt,
Here is the test I run:
Source: bourne_ultimatum_trailer.zip @ http://www.h264info.com/clips.html
Set CPU to performance mode

echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

Check with tegrastats:
sudo ~/tegrastats
Run

ubuntu@tegra-ubuntu:~$ gst-launch-1.0 -e filesrc location=Bourne_Trailer.mp4 ! qtdemux ! h264parse ! omxh264dec ! nvvidconv ! 'video/x-raw(memory:NVMM),format=(string)I420' ! omxh264enc insert-sps-pps=true ! 'video/x-h264, stream-format=(string)byte-stream, profile=high' ! h264parse ! filesink location=out.h264
Setting pipeline to PAUSED ...
Inside NvxLiteH264DecoderLowLatencyInitNvxLiteH264DecoderLowLatencyInit set DPB and MjstreamingPipeline is PREROLLING ...
NvMMLiteOpen : Block : BlockType = 261
TVMR: NvMMLiteTVMRDecBlockOpen: 4937: NvMMLiteBlockOpen
NvMMLiteBlockCreate : Block : BlockType = 261
TVMR: cbBeginSequence: 571: BeginSequence  1920x816, bVPR = 0
TVMR: cbBeginSequence: 813: DecodeBuffers = 2
TVMR: cbBeginSequence: 833: Display Resolution : (1920x816)
TVMR: cbBeginSequence: 834: Display Aspect Ratio : (1920x816)
TVMR: cbBeginSequence: 998: SurfaceLayout = 3
TVMR: cbBeginSequence: 1028: NumOfSurfaces = 6, InteraceStream = 0, InterlaceEnabled = 0, bSecure = 0, MVC = 0 Semiplanar = 1, bReinit = 1
Allocating new output: 1920x816 (x 8), ThumbnailMode = 0
Framerate set to : 24 at NvxVideoEncoderSetParameterNvMMLiteOpen : Block : BlockType = 4
===== MSENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
===== MSENC blits (mode: 1) into tiled surfaces =====
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
TVMR: NvMMLiteTVMRDecDoWork: 4017: NVMMLITE_TVMR: EOS detected
TVMR: TVMRBufferProcessing: 3454: Processing of EOS Done
Got EOS from element "pipeline0".
Execution ended after 0:00:14.118075471
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
TVMR: TVMRFrameDelivery: 3675: Closing TVMR Frame Delivery Thread -------------
TVMR: NvMMDecTVMRDestroyParser: 4153: NvAvpClose
TVMR: NvMMLiteTVMRDecBlockClose: 5092: Done
Setting pipeline to NULL ...
Freeing pipeline ...

tegrastats shows

RAM 169/1892MB (lfb 231x4MB) cpu [24%,off,off,off]@2320 EMC 33%@924 AVP 3%@300 VDE 528 GR3D 0%@12 EDP limit 0
RAM 168/1892MB (lfb 231x4MB) cpu [25%,off,off,off]@2320 EMC 33%@924 AVP 3%@300 VDE 528 GR3D 0%@12 EDP limit 0
RAM 168/1892MB (lfb 230x4MB) cpu [22%,off,off,off]@2320 EMC 31%@924 AVP 3%@300 VDE 528 GR3D 0%@12 EDP limit 0
RAM 168/1892MB (lfb 229x4MB) cpu [22%,off,off,off]@2320 EMC 32%@924 AVP 3%@300 VDE 528 GR3D 0%@12 EDP limit 0

The command is a transcoding function. For 90 second 24fps input, it transcodes in 14 second and takes 1 cpu core @ 22%.

Please profile your usecase via tegrastats. You should see different CPU usage between SW and HW H264 encoders.

Hi DaneLLL,

Thank you for your reply, we have a test on the TK1, enable all of CPU to high performance mode, and run same gstreamer command with same video file, but the CPU consuming is higher than what you showed (23%). we are sue there is no other apps running at the same time.

sudo gst-launch-1.0 -e filesrc location=Bourne_Trailer.mp4 ! qtdemux ! h264parse ! omxh264dec ! nvvidconv ! 'video/x-raw(memory:NVMM),format=(string)I420' ! omxh264enc insert-sps-pps=true ! 'video/x-h264, stream-format=(string)byte-stream, profile=high' ! h264parse ! filesink location=out.h264            
Setting pipeline to PAUSED ...
Inside NvxLiteH264DecoderLowLatencyInitNvxLiteH264DecoderLowLatencyInit set DPB and MjstreamingPipeline is PREROLLING ...
NvMMLiteOpen : Block : BlockType = 261 
TVMR: NvMMLiteTVMRDecBlockOpen: 4937: NvMMLiteBlockOpen 
NvMMLiteBlockCreate : Block : BlockType = 261 
TVMR: cbBeginSequence: 571: BeginSequence  1920x816, bVPR = 0
TVMR: cbBeginSequence: 813: DecodeBuffers = 2 
TVMR: cbBeginSequence: 833: Display Resolution : (1920x816) 
TVMR: cbBeginSequence: 834: Display Aspect Ratio : (1920x816) 
TVMR: cbBeginSequence: 998: SurfaceLayout = 3
TVMR: cbBeginSequence: 1028: NumOfSurfaces = 6, InteraceStream = 0, InterlaceEnabled = 0, bSecure = 0, MVC = 0 Semiplanar = 1, bReinit = 1 
Allocating new output: 1920x816 (x 8), ThumbnailMode = 0
Framerate set to : 24 at NvxVideoEncoderSetParameterNvMMLiteOpen : Block : BlockType = 4 
===== MSENC =====
NvMMLiteBlockCreate : Block : BlockType = 4 
===== MSENC blits (mode: 1) into tiled surfaces =====
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
TVMR: NvMMLiteTVMRDecDoWork: 4017: NVMMLITE_TVMR: EOS detected
TVMR: TVMRBufferProcessing: 3454: Processing of EOS Done
Got EOS from element "pipeline0".
Execution ended after 0:00:17.145813362
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
TVMR: TVMRFrameDelivery: 3675: Closing TVMR Frame Delivery Thread -------------
TVMR: NvMMDecTVMRDestroyParser: 4153: NvAvpClose
TVMR: NvMMLiteTVMRDecBlockClose: 5092: Done 
Setting pipeline to NULL ...
Freeing pipeline ...

tegrastats shows

RAM 486/1892MB (lfb 2x1MB) cpu [19%,7%,10%,8%]@2320 EMC 22%@924 AVP 2%@300 VDE 528 GR3D 0%@852 EDP limit 0
RAM 488/1892MB (lfb 2x1MB) cpu [13%,9%,13%,9%]@2320 EMC 30%@924 AVP 3%@300 VDE 528 GR3D 0%@852 EDP limit 0
RAM 489/1892MB (lfb 2x1MB) cpu [17%,11%,12%,9%]@2320 EMC 29%@924 AVP 3%@300 VDE 528 GR3D 0%@852 EDP limit 0
RAM 490/1892MB (lfb 2x1MB) cpu [16%,9%,8%,9%]@2320 EMC 29%@924 AVP 3%@204 VDE 528 GR3D 0%@852 EDP limit 0
RAM 489/1892MB (lfb 2x1MB) cpu [17%,11%,10%,10%]@2320 EMC 31%@924 AVP 3%@204 VDE 528 GR3D 0%@852 EDP limit 0
RAM 489/1892MB (lfb 2x1MB) cpu [21%,13%,14%,14%]@2320 EMC 29%@924 AVP 3%@300 VDE 528 GR3D 0%@852 EDP limit 0
RAM 490/1892MB (lfb 2x1MB) cpu [15%,11%,9%,11%]@2320 EMC 28%@924 AVP 3%@300 VDE 528 GR3D 0%@852 EDP limit 0
RAM 490/1892MB (lfb 2x1MB) cpu [11%,10%,5%,6%]@2320 EMC 25%@924 AVP 3%@300 VDE 528 GR3D 0%@852 EDP limit 0
RAM 489/1892MB (lfb 2x1MB) cpu [12%,11%,9%,11%]@2320 EMC 26%@924 AVP 3%@300 VDE 528 GR3D 0%@852 EDP limit 0
RAM 490/1892MB (lfb 2x1MB) cpu [6%,9%,8%,11%]@2320 EMC 26%@924 AVP 3%@300 VDE 528 GR3D 0%@852 EDP limit 0
RAM 490/1892MB (lfb 2x1MB) cpu [14%,11%,6%,7%]@2320 EMC 27%@924 AVP 3%@204 VDE 528 GR3D 0%@852 EDP limit 0
RAM 490/1892MB (lfb 2x1MB) cpu [16%,11%,16%,13%]@2320 EMC 28%@924 AVP 3%@300 VDE 528 GR3D 0%@852 EDP limit 0
RAM 491/1892MB (lfb 2x1MB) cpu [15%,12%,6%,9%]@2320 EMC 29%@924 AVP 3%@300 VDE 528 GR3D 0%@852 EDP limit 0
RAM 490/1892MB (lfb 2x1MB) cpu [17%,16%,6%,9%]@2320 EMC 28%@924 AVP 3%@300 VDE 528 GR3D 0%@852 EDP limit 0
RAM 491/1892MB (lfb 2x1MB) cpu [17%,14%,15%,12%]@2320 EMC 27%@924 AVP 3%@300 VDE 528 GR3D 0%@852 EDP limit 0
RAM 492/1892MB (lfb 2x1MB) cpu [20%,10%,15%,10%]@2320 EMC 28%@924 AVP 3%@300 VDE 528 GR3D 0%@852 EDP limit 0
RAM 492/1892MB (lfb 2x1MB) cpu [18%,12%,11%,38%]@2320 EMC 30%@924 AVP 3%@300 VDE 528 GR3D 0%@852 EDP limit 0

we have already enabled both of four CPU as below:

sunz@tegra-ubuntu:/media/hdisk1/sunz/tmp$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor               
performance
sunz@tegra-ubuntu:/media/hdisk1/sunz/tmp$ cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
performance
sunz@tegra-ubuntu:/media/hdisk1/sunz/tmp$ cat /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
performance
sunz@tegra-ubuntu:/media/hdisk1/sunz/tmp$ cat /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor
performance
sunz@tegra-ubuntu:/media/hdisk1/sunz/tmp$

are there any additional suggestions?

thanks
-zhi

Hi zhi,
Are you on r21.5? What are the process with high CPU % in ‘top’?

Hi DaneLLL,

the top process with high CPU is just gst-launch-1.0, about 35% at that time, also we are sure there is no other app consuming the CPU during the test period.

the version of Tegra4Linux is R21 as below, is it a problem?

ubuntu@tegra-ubuntu:~$ cat /etc/nv_tegra_release
# R21 (release), REVISION: 4.0, GCID: 5650832, BOARD: ardbeg, EABI: hard, DATE: Thu Jun 25 22:38:59 UTC 2015
7c0e59658a9f05c0f81a92c6c79d35ed0d144c43 */usr/lib/xorg/modules/drivers/nvidia_drv.so
a78016854fa1475c1f5271a43b928492ec5f66d3 */usr/lib/xorg/modules/extensions/libglx.so
3a6ad4641f163aa65b7f453637dfaff511e554a5 */usr/lib/arm-linux-gnueabihf/tegra/libnvmmlite_video.so
28e64e7bc910f313aa6b27aebf30e93b3e37d247 */usr/lib/arm-linux-gnueabihf/tegra/libnvwinsys.so
3afd51954af5246ad7139897f5f856b9c9748d99 */usr/lib/arm-linux-gnueabihf/tegra/libnvomx.so
ae9f05fcbeed357fa8b54c757e3c4babcd83340f */usr/lib/arm-linux-gnueabihf/tegra/libnvtestio.so
88cc04ed62acc205b2b0d415dddaeba5feac8262 */usr/lib/arm-linux-gnueabihf/tegra/libnvmm.so
23e9c8c849d325d906f4bbfe8188998e8f138af4 */usr/lib/arm-linux-gnueabihf/tegra/libnvodm_imager.so
3cfce2f6a06e0e4575fdd617d7509951c8a3f12b */usr/lib/arm-linux-gnueabihf/tegra/libnvtestresults.so
f631876f8ea795e27e3004383ae28cec10bf9f97 */usr/lib/arm-linux-gnueabihf/tegra/libnvapputil.so
b89b4582a1eedb9c75de54f611128c64f8ccf36c */usr/lib/arm-linux-gnueabihf/tegra/libnvmm_utils.so
891d3a2aaf5756e4f8ac8acfa9482eea5fc6e079 */usr/lib/arm-linux-gnueabihf/tegra/libnvrm_graphics.so
358b49b0dfd9f1b90451b0e2b79bbb2dee3f57cb */usr/lib/arm-linux-gnueabihf/tegra/libnvos.so
7c0f4f235082c19f1b256e44c40b59b710f5ce27 */usr/lib/arm-linux-gnueabihf/tegra/libnvmmlite_image.so
3ff5994ae5de33a4387e4fca1a3c416322320f10 */usr/lib/arm-linux-gnueabihf/tegra/libnvddk_vic.so
e96a78d9c3947a980f065d96557f98c70500db17 */usr/lib/arm-linux-gnueabihf/tegra/libnvomxilclient.so
a78016854fa1475c1f5271a43b928492ec5f66d3 */usr/lib/arm-linux-gnueabihf/tegra/libglx.so
89311fb48763f411f1fc65c1169ca15de8f80dd0 */usr/lib/arm-linux-gnueabihf/tegra/libnvmm_camera_v3.so
40b871fb69fcfc0ee63cbc0b91cf5890a942a070 */usr/lib/arm-linux-gnueabihf/tegra/libnvmm_parser.so
46e9b8f63a1cabc9b96062fe22b151ed384a4cd5 */usr/lib/arm-linux-gnueabihf/tegra/libnvmmlite_audio.so
2d3ef61c66c76e11c0d8af5772de846cee24c0b6 */usr/lib/arm-linux-gnueabihf/tegra/libnvodm_query.so
1df7332065fa5265a39c1b13e513760cdccffa7a */usr/lib/arm-linux-gnueabihf/tegra/libnvmmlite_utils.so
42c2a7f14dca326007457084960878dacc93b9e3 */usr/lib/arm-linux-gnueabihf/tegra/libnvtnr.so
f683204db73ca835e0699fe42fcbab610253f9a0 */usr/lib/arm-linux-gnueabihf/tegra/libnvmm_contentpipe.so
39f41d0d08b7bb504d5ecff47ccb2aeffd0d164e */usr/lib/arm-linux-gnueabihf/tegra/libjpeg.so
29d7fa4a53a09838a8823308ecec4c03d6745971 */usr/lib/arm-linux-gnueabihf/tegra/libnvddk_2d_v2.so
055df2fa54b1fc83b57d43182732f65d2b716f2c */usr/lib/arm-linux-gnueabihf/tegra/libnvdc.so
8421f1f137f900793c83cba990eed6db2a0d9395 */usr/lib/arm-linux-gnueabihf/tegra/libtegrav4l2.so
aa982a8b304d8257da53ef32057885d2bbb120bf */usr/lib/arm-linux-gnueabihf/tegra/libnvmmlite.so
1f0d0d76834e8a4f4a6dd5e074367913d0a8c077 */usr/lib/arm-linux-gnueabihf/tegra/libnvparser.so
76037d3a508602b3ba83cd1f3a81c83ea19e5900 */usr/lib/arm-linux-gnueabihf/tegra/libnvsm.so
dd269a0f63bbb906b99090d38bc0bbf6945d5e14 */usr/lib/arm-linux-gnueabihf/tegra/libnvavp.so
227338492f9660d73831312a4645f37300f00f27 */usr/lib/arm-linux-gnueabihf/tegra/libnvtvmr.so
16acd9f15d887e0f7ebc7bc269687ae6f7f1a3e8 */usr/lib/arm-linux-gnueabihf/tegra/libnvrm.so

thanks
-zhi

Hi zhi,
Yours is r21.4. Please try r21.5.