CPU utilization on transcode

I am testing the scalability of the TX2. I run one transcoding session using the following command:
nvidia@tegra-ubuntu:~/scripts/works/trans$ sh -x ./h264.sh

  • CLIENT_IP=239.255.0.1
  • gst-launch-1.0 filesrc location=/home/nvidia/HD_MP2_06011_TS_ASYN_V1_001.mpeg ! tsdemux name=demux demux. ! mpegvideoparse ! omxmpeg2videodec ! nvvidconv ! video/x-raw(memory:NVMM), format=(string)I420 ! omxh264enc control-rate=2 bitrate=5000000 ! video/x-h264, stream-format=(string)byte-stream ! h264parse ! rtph264pay mtu=1400 ! udpsink host=239.255.0.1 port=5000 sync=false async=false
    Setting pipeline to PAUSED …
    Inside NvxLiteH264DecoderLowLatencyInitNvxLiteH264DecoderLowLatencyInit set DPB and MjstreamingInside NvxLiteH265DecoderLowLatencyInitNvxLiteH265DecoderLowLatencyInit set DPB and MjstreamingPipeline is PREROLLED …
    Setting pipeline to PLAYING …
    New clock: GstSystemClock
    NvMMLiteOpen : Block : BlockType = 267
    TVMR: NvMMLiteTVMRDecBlockOpen: 7818: NvMMLiteBlockOpen
    NvMMLiteBlockCreate : Block : BlockType = 267
    TVMR: cbBeginSequence: 1190: BeginSequence 1280x720, bVPR = 0
    TVMR: LowCorner Frequency = 100000
    TVMR: cbBeginSequence: 1583: DecodeBuffers = 3, pnvsi->eCodec = 1, codec = 5
    TVMR: cbBeginSequence: 1654: Display Resolution : (1280x720)
    TVMR: cbBeginSequence: 1655: Display Aspect Ratio : (1280x720)
    TVMR: cbBeginSequence: 1697: ColorFormat : 5
    TVMR: cbBeginSequence:1711 ColorSpace = NvColorSpace_YCbCr601
    TVMR: cbBeginSequence: 1839: SurfaceLayout = 3
    TVMR: cbBeginSequence: 1936: NumOfSurfaces = 7, InteraceStream = 0, InterlaceEnabled = 0, bSecure = 0, MVC = 0 Semiplanar = 1, bReinit = 1, BitDepthForSurface = 8 LumaBitDepth = 8, ChromaBitDepth = 8, ChromaFormat = 5
    TVMR: cbBeginSequence: 1938: BeginSequence ColorPrimaries = 2, TransferCharacteristics = 2, MatrixCoefficients = 2
    Allocating new output: 1280x720 (x 7), ThumbnailMode = 0
    Framerate set to : 60 at NvxVideoEncoderSetParameterNvMMLiteOpen : Block : BlockType = 4
    ===== MSENC =====
    NvMMLiteBlockCreate : Block : BlockType = 4
    ===== MSENC blits (mode: 1) into tiled surfaces =====
    TVMR: FrameRate = 60
    TVMR: NVDEC LowCorner Freq = (200000 * 1024)
    TVMR: FrameRate = 60.000240
    TVMR: FrameRate = 60.000240
    TVMR: FrameRate = 60.000240
    TVMR: FrameRate = 60.000240
    TVMR: FrameRate = 60.000240
    TVMR: FrameRate = 60.000240
    TVMR: FrameRate = 60.000240
    TVMR: FrameRate = 60.000240
    TVMR: FrameRate = 60.000240
    TVMR: FrameRate = 60.000240
    TVMR: FrameRate = 60.000240
    TVMR: FrameRate = 60.000240
    TVMR: FrameRate = 60.000240
    TVMR: FrameRate = 60.000240
    TVMR: FrameRate = 60.000240
    TVMR: FrameRate = 60.000240
    TVMR: FrameRate = 60.000240
    TVMR: FrameRate = 60.000240
    TVMR: FrameRate = 60.000240
    TVMR: FrameRate = 60.000240
    TVMR: FrameRate = 60.000240
    TVMR: FrameRate = 60.000240
    TVMR: FrameRate = 60.000240
    TVMR: NvMMLiteTVMRDecDoWork: 6665: NVMMLITE_TVMR: EOS detected
    TVMR: TVMRBufferProcessing: 5641: Processing of EOS
    TVMR: TVMRBufferProcessing: 5716: Processing of EOS Done
    Got EOS from element “pipeline0”.
    Execution ended after 0:00:19.526421381
    Setting pipeline to PAUSED …
    Setting pipeline to READY …
    TVMR: TVMRFrameStatusReporting: 6266: Closing TVMR Frame Status Thread -------------
    TVMR: TVMRVPRFloorSizeSettingThread: 6084: Closing TVMRVPRFloorSizeSettingThread -------------
    TVMR: TVMRFrameDelivery: 6116: Closing TVMR Frame Delivery Thread -------------
    TVMR: NvMMLiteTVMRDecBlockClose: 8018: Done
    Setting pipeline to NULL …
    Freeing pipeline …

The top command returns:

top - 19:08:30 up 14 min, 4 users, load average: 0.51, 0.25, 0.15
Tasks: 273 total, 1 running, 272 sleeping, 0 stopped, 0 zombie
%Cpu(s): 14.0 us, 2.4 sy, 0.0 ni, 83.3 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
KiB Mem : 8042556 total, 5475960 free, 1736572 used, 830024 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 6152468 avail Mem

So one transcode from mpeg2 to h264 (decode mpeg2 in hardware, encode h264 in hardware) takes about 16% of the CPU. Does this look correct? In the TX2 datasheet it claims 14x 1080p30. I was doing 60 FPS, but still I would expect 5 to 6% CPU utilization to do 14x1080p30.

Thx,
K-

Please also try to profile with sudo ./tegrastats

We have verified 6x 1080p60 h265 encoding
https://devtalk.nvidia.com/default/topic/1009082/jetson-tx1/multiple-h-265-video-encoding-/post/5150527/#5150527

14x 1080p30 should be just within the limitation.

decode.sh:
CLIENT_IP=239.255.0.1
gst-launch-1.0 udpsrc multicast_group=$CLIENT_IP port=5000 ! application/x-rtp,encoding-name=H264,payload=96 ! rtph264depay ! h264parse ! queue ! omxh264dec ! nvoverlaysink sync=false async=false -e

encode.sh:
gst-launch-1.0 filesrc location=/home/nvidia/HD_MP2_06011_TS_ASYN_V1_001.mpeg ! tsdemux name=demux demux. ! mpegvideoparse ! omxmpeg2videodec ! nvvidconv ! ‘video/x-raw(memory:NVMM), format=(string)I420’ ! omxh264enc control-rate=2 bitrate=5000000 ! ‘video/x-h264, stream-format=(string)byte-stream’ ! h264parse ! rtph264pay mtu=1400 ! udpsink host=$CLIENT_IP port=5000 sync=false async=false

I am decoding an mpeg file, then re-encoding the decompressed stream and sending it over the network, then decoding (on the same X2).

here is the utilization:

RAM 2480/7854MB (lfb 959x4MB) cpu [0%@2025,0%@2048,0%@2048,0%@2030,0%@2033,0%@2034] EMC 0%@1866 APE 150 VDE 576 GR3D 0%@1300
RAM 2480/7854MB (lfb 959x4MB) cpu [0%@2028,0%@2051,0%@2048,0%@2032,0%@2033,0%@2035] EMC 0%@1866 APE 150 VDE 576 GR3D 0%@1300
RAM 2480/7854MB (lfb 959x4MB) cpu [0%@2024,0%@2049,0%@2048,0%@2033,0%@2034,0%@2034] EMC 0%@1866 APE 150 VDE 576 GR3D 0%@1300
RAM 2480/7854MB (lfb 959x4MB) cpu [0%@2028,0%@2049,0%@2047,0%@2034,0%@2033,0%@2034] EMC 0%@1866 APE 150 VDE 576 GR3D 0%@1300
RAM 2485/7854MB (lfb 959x4MB) cpu [16%@2034,6%@2050,7%@2047,11%@2036,10%@2035,12%@2033] EMC 13%@1866 APE 150 VDE 1203 GR3D 0%@1300
RAM 2485/7854MB (lfb 959x4MB) cpu [23%@2033,7%@2048,0%@2046,20%@2036,18%@2034,16%@2033] EMC 21%@1866 APE 150 VDE 1203 GR3D 0%@1300
RAM 2485/7854MB (lfb 959x4MB) cpu [14%@2034,5%@2049,0%@2046,9%@2035,9%@2035,4%@2033] EMC 18%@1866 APE 150 VDE 1203 GR3D 0%@1300
RAM 2486/7854MB (lfb 959x4MB) cpu [9%@2034,3%@2047,0%@2046,9%@2036,8%@2035,5%@2035] EMC 14%@1866 APE 150 VDE 1203 GR3D 0%@1300
RAM 2486/7854MB (lfb 959x4MB) cpu [4%@2032,3%@2050,0%@2048,7%@2036,5%@2039,5%@2033] EMC 12%@1866 APE 150 VDE 1203 GR3D 0%@1300
RAM 2486/7854MB (lfb 959x4MB) cpu [6%@2027,2%@2052,0%@2046,4%@2031,4%@2032,5%@2033] EMC 10%@1866 APE 150 VDE 1203 GR3D 0%@1300
RAM 2486/7854MB (lfb 959x4MB) cpu [4%@2027,2%@2051,0%@2047,7%@2033,6%@2032,3%@2034] EMC 10%@1866 APE 150 VDE 1203 GR3D 0%@1300
RAM 2486/7854MB (lfb 959x4MB) cpu [4%@2027,2%@2046,0%@2046,5%@2032,3%@2033,5%@2034] EMC 9%@1866 APE 150 VDE 1203 GR3D 0%@1300
RAM 2486/7854MB (lfb 959x4MB) cpu [7%@2028,4%@2047,0%@2047,5%@2030,5%@2033,7%@2033] EMC 9%@1866 APE 150 VDE 1203 GR3D 0%@1300
RAM 2486/7854MB (lfb 959x4MB) cpu [5%@2028,2%@2047,0%@2046,5%@2035,5%@2036,6%@2034] EMC 9%@1866 APE 150 VDE 1203 GR3D 0%@1300
RAM 2486/7854MB (lfb 959x4MB) cpu [5%@2028,3%@2047,0%@2046,4%@2029,5%@2034,7%@2034] EMC 9%@1866 APE 150 VDE 1203 GR3D 0%@1300
RAM 2486/7854MB (lfb 959x4MB) cpu [6%@2026,3%@2049,0%@2046,5%@2032,3%@2021,7%@2034] EMC 9%@1866 APE 150 VDE 1203 GR3D 0%@1300
RAM 2486/7854MB (lfb 959x4MB) cpu [5%@2030,3%@2051,0%@2048,3%@2033,5%@2035,6%@2034] EMC 9%@1866 APE 150 VDE 1203 GR3D 0%@1300
RAM 2486/7854MB (lfb 959x4MB) cpu [4%@2035,2%@2046,0%@2048,6%@2036,7%@2037,6%@2033] EMC 9%@1866 APE 150 VDE 1203 GR3D 0%@1300
RAM 2486/7854MB (lfb 959x4MB) cpu [7%@2027,3%@2051,0%@2046,6%@2037,3%@2032,8%@2033] EMC 9%@1866 APE 150 VDE 1203 GR3D 0%@1300
RAM 2486/7854MB (lfb 959x4MB) cpu [5%@2024,2%@2051,0%@2046,3%@2031,6%@2034,7%@2035] EMC 9%@1866 APE 150 VDE 1203 GR3D 0%@1300
RAM 2486/7854MB (lfb 959x4MB) cpu [6%@2023,3%@2047,0%@2045,4%@2033,6%@2033,5%@2033] EMC 9%@1866 APE 150 VDE 1203 GR3D 0%@1300
RAM 2486/7854MB (lfb 959x4MB) cpu [7%@2025,3%@2050,0%@2046,4%@2034,5%@2033,8%@2035] EMC 9%@1866 APE 150 VDE 1203 GR3D 0%@1300
RAM 2486/7854MB (lfb 959x4MB) cpu [5%@2035,3%@2049,0%@2047,5%@2037,4%@2036,6%@2037] EMC 9%@1866 APE 150 VDE 1203 GR3D 0%@1300
RAM 2486/7854MB (lfb 959x4MB) cpu [7%@2023,3%@2047,0%@2045,7%@2031,5%@2031,6%@2031] EMC 9%@1866 APE 150 VDE 1203 GR3D 0%@1300
RAM 2479/7854MB (lfb 959x4MB) cpu [1%@2035,0%@2049,0%@2046,0%@2038,1%@2037,0%@2038] EMC 4%@1866 APE 150 VDE 1203 GR3D 0%@1300
RAM 2479/7854MB (lfb 959x4MB) cpu [0%@2027,0%@2048,0%@2048,0%@2030,0%@2032,0%@2031] EMC 2%@1866 APE 150 VDE 1203 GR3D 0%@1300
RAM 2479/7854MB (lfb 959x4MB) cpu [0%@2034,0%@2048,0%@2047,0%@2037,0%@2037,0%@2037] EMC 0%@1866 APE 150 VDE 1203 GR3D 0%@1300
RAM 2479/7854MB (lfb 959x4MB) cpu [0%@2036,0%@2047,0%@2048,0%@2036,0%@2038,0%@2038] EMC 0%@1866 APE 150 VDE 1203 GR3D 0%@1300

Hi kunice, the result looks fine.