WebRTC with Hardware Accelerated Video Encoding

Hi Guys,

I’m a TX1 newbie and am trying to learn about how/when/with what can hardware accelerated video encoding be achieved. I’m running an application that uses WebRTC via Chromium. I have Chromium configured with “Use Hardware Acceleration When Enabled” set to TRUE. “chrome://gpu” shows “Video Encode” is hardware accelerated, as are several additional rendering functions. WebRTC demos (from https://webrtc.github.io/samples/) run fine.

Here’s my issue: I run “top” to watch the system load for chromium tasks. When rendering is out of the equation (i.e. I tab out of the rendering screen so visuals are hidden), the task CPU load doesn’t change with hardware acceleration enabled or disabled. The camera is still capturing, which is visible via the chrome://webrtc-internals graphs. So, a question is, how is hardware accelerated video encoding being accomplished? Because the system load doesn’t seem to change with/without hardware acceleration, I’m thinking the video encode isn’t being hardware accelerated using the video encoder feature of the TX1.

I’ve been reading about GStreamer and omx support. I can see distinct load differences using GStreamer and a camera at the command line with software versus hardware accelerated encoding, so the hardware seems to be working.

Can anyone point me in the right direction to try and get hardware accelerated video encoding under WebRTC working as efficiently as possible, and in a way I can prove to my dev team? Again, I’m new to this platform and thank you in advance for your patience with helping me get up to speed. I’m trying to learn all the best ways to exploit hardware acceleration features. (BTW, our CUDA stuff is working great!) WebRTC is our challenge at the moment.

Thanks a lot,

Ty

Hi Ty,
Please check the MSENC frequency via tegrastats:
sudo ~/tegrastats

MSENC frequency should be varying in HW encoding enabled.

Hi DaneLLL,

That’s a handy utility, thanks. I confirmed that the MSENC frequency does not change (it stays at 192) using WebRTC on Chromium. BTW, I’ve only been able to test it with USB cameras. I am unable to select the onboard CSI camera as a video source within Chromium. I should be able to select it because the CSI camera seems to have V4L2 support, including a /dev/video0 device. Not sure what’s up here.

Question: Does the Chromium source available under https://developer.nvidia.com/embedded/downloads have any special code or plugins specifically for NVidia that aren’t present in the standard Chromium source? Or, is it there for convenience only?

I am currently trying to cross-compile OpenWebRTC for the TX1. If this is successful, I should be able to use hardware accelerated video encode under Chromium since OpenWebRTC is built on top of GStreamer.

Thanks,

Ty

It is a standard source with proper LICENSE file.

By linking omxh264enc/omxh265enc in the gstreamer command, you are able to use hardware accelerated video encode.

I just set up a new Jetson TX2 flashed with release 27.1. The tegrastats command no longer shows the MSENC frequency described above. Can someone tell me what utility I can use that will show when the hardware video encoder is working?

Thanks,

Ty

Hi Ty,
We can observe the issue. Please try the tegrastats attached.

(03/30/2017 UPDATE) attach tegrastats

tegrastats.zip (14 KB)

Thanks, the updated utility shows an MSENC value. However, I’m a little confused by the output.

I run this command (from page 6 of the Jetson TX2 Accelerated Streamer User Guide Release 24 and 27):

gst-launch-1.0 videotestsrc ! ‘video/x-raw, format=(string)I420, width=(int)640, height=(int)480’ ! omxh264enc ! ‘video/x-h264, stream- format=(string)byte-stream’ ! h264parse ! qtmux ! filesink location=test.mp4 -e

When I run it on the TX1 running r24.2.1, I get the following tegrastats output:

RAM 725/3995MB (lfb 633x4MB) cpu [1%,0%,0%,0%]@102 EMC 8%@68 AVP 33%@12 NVDEC 192 MSENC 192 GR3D 0%@76 EDP limit 1734
RAM 725/3995MB (lfb 633x4MB) cpu [3%,0%,0%,1%]@102 EMC 8%@68 AVP 33%@12 NVDEC 192 MSENC 192 GR3D 0%@76 EDP limit 1734
RAM 725/3995MB (lfb 633x4MB) cpu [1%,0%,0%,0%]@102 EMC 8%@68 AVP 33%@12 NVDEC 192 MSENC 192 GR3D 0%@76 EDP limit 1734
RAM 729/3995MB (lfb 633x4MB) cpu [1%,7%,1%,4%]@1555 EMC 0%@1331 AVP 4%@115 NVDEC 192 MSENC 192 GR3D 0%@76 EDP limit 1734
RAM 741/3995MB (lfb 629x4MB) cpu [43%,87%,5%,1%]@1734 EMC 7%@1600 AVP 7%@12 NVDEC 716 MSENC 716 GR3D 0%@76 EDP limit 1734
RAM 741/3995MB (lfb 628x4MB) cpu [31%,100%,5%,0%]@1734 EMC 9%@1600 AVP 2%@12 NVDEC 716 MSENC 716 GR3D 0%@76 EDP limit 1734
RAM 741/3995MB (lfb 626x4MB) cpu [2%,100%,34%,1%]@1734 EMC 9%@1600 AVP 2%@12 NVDEC 716 MSENC 716 GR3D 0%@76 EDP limit 1734
RAM 742/3995MB (lfb 624x4MB) cpu [23%,100%,17%,1%]@1734 EMC 10%@1600 AVP 2%@12 NVDEC 716 MSENC 716 GR3D 0%@76 EDP limit 1734
RAM 742/3995MB (lfb 623x4MB) cpu [35%,100%,3%,0%]@1734 EMC 10%@1600 AVP 2%@12 NVDEC 716 MSENC 716 GR3D 0%@76 EDP limit 1734
RAM 742/3995MB (lfb 622x4MB) cpu [12%,100%,0%,18%]@1734 EMC 10%@1600 AVP 2%@12 NVDEC 716 MSENC 716 GR3D 0%@76 EDP limit 1734
RAM 742/3995MB (lfb 620x4MB) cpu [0%,100%,0%,32%]@1734 EMC 10%@1600 AVP 2%@12 NVDEC 716 MSENC 716 GR3D 0%@76 EDP limit 1734
RAM 742/3995MB (lfb 619x4MB) cpu [6%,100%,0%,26%]@1734 EMC 10%@1600 AVP 2%@12 NVDEC 716 MSENC 716 GR3D 0%@76 EDP limit 1734
RAM 743/3995MB (lfb 618x4MB) cpu [8%,100%,0%,25%]@1734 EMC 10%@1600 AVP 2%@12 NVDEC 716 MSENC 716 GR3D 0%@76 EDP limit 1734
RAM 734/3995MB (lfb 617x4MB) cpu [25%,35%,0%,9%]@102 EMC 36%@204 AVP 1%@115 NVDEC 192 MSENC 192 GR3D 0%@76 EDP limit 1734
RAM 734/3995MB (lfb 617x4MB) cpu [12%,0%,0%,0%]@102 EMC 35%@68 AVP 23%@12 NVDEC 192 MSENC 192 GR3D 0%@76 EDP limit 1734
RAM 734/3995MB (lfb 617x4MB) cpu [5%,0%,0%,1%]@102 EMC 20%@68 AVP 29%@12 NVDEC 192 MSENC 192 GR3D 0%@76 EDP limit 1734

The MSENC frequency value changes from 192 to 716 while the command is running, and returns to 192 when the command is terminated.

But when I run the same command on the TX2 running r27.1, I get the following output:

RAM 945/7854MB (lfb 1570x4MB) cpu [6%@345,off,off,3%@347,3%@348,2%@347] EMC 5%@665 APE 150 NVDEC 1203 MSENC 1164 GR3D 0%@140
RAM 945/7854MB (lfb 1570x4MB) cpu [5%@345,off,off,2%@347,3%@347,4%@348] EMC 5%@665 APE 150 NVDEC 1203 MSENC 1164 GR3D 0%@140
RAM 945/7854MB (lfb 1570x4MB) cpu [7%@345,off,off,4%@348,3%@347,6%@352] EMC 5%@665 APE 150 NVDEC 1203 MSENC 1164 GR3D 0%@140
RAM 953/7854MB (lfb 1570x4MB) cpu [8%@1838,off,off,36%@1843,19%@1843,10%@1846] EMC 4%@1600 APE 150 NVDEC 1203 MSENC 1164 GR3D 0%@140
RAM 953/7854MB (lfb 1570x4MB) cpu [28%@1840,off,off,100%@1842,16%@1843,18%@1844] EMC 9%@1600 APE 150 NVDEC 1203 MSENC 1164 GR3D 0%@140
RAM 955/7854MB (lfb 1570x4MB) cpu [42%@1843,off,off,99%@1845,21%@1820,31%@1843] EMC 11%@1600 APE 150 NVDEC 1203 MSENC 1164 GR3D 0%@140
RAM 955/7854MB (lfb 1570x4MB) cpu [24%@1837,off,off,100%@1839,10%@1846,38%@1849] EMC 12%@1600 APE 150 NVDEC 1203 MSENC 1164 GR3D 0%@140
RAM 955/7854MB (lfb 1570x4MB) cpu [37%@1840,off,off,100%@1843,8%@1843,21%@1843] EMC 13%@1600 APE 150 NVDEC 1203 MSENC 1164 GR3D 0%@140
RAM 956/7854MB (lfb 1570x4MB) cpu [34%@1880,off,off,100%@1881,11%@1884,21%@1883] EMC 13%@1600 APE 150 NVDEC 1203 MSENC 1164 GR3D 0%@140
RAM 956/7854MB (lfb 1570x4MB) cpu [39%@1842,off,off,100%@1843,10%@1847,16%@1853] EMC 13%@1600 APE 150 NVDEC 1203 MSENC 1164 GR3D 0%@140
RAM 946/7854MB (lfb 1570x4MB) cpu [17%@346,off,off,35%@348,7%@347,5%@347] EMC 21%@665 APE 150 NVDEC 1203 MSENC 1164 GR3D 0%@140
RAM 946/7854MB (lfb 1570x4MB) cpu [0%@345,off,off,1%@347,0%@347,2%@347] EMC 12%@665 APE 150 NVDEC 1203 MSENC 1164 GR3D 0%@140
RAM 946/7854MB (lfb 1570x4MB) cpu [12%@652,off,off,7%@655,6%@655,10%@655] EMC 7%@800 APE 150 NVDEC 1203 MSENC 1164 GR3D 67%@229
RAM 946/7854MB (lfb 1570x4MB) cpu [4%@345,off,off,4%@348,4%@348,5%@348] EMC 7%@665 APE 150 NVDEC 1203 MSENC 1164 GR3D 0%@140

The output shows MSENC staying at 1164. If I’m reading this output correctly, I can see where the cpu loads show something is running, just like on the TX1, but no change to the MSENC frequency. Plus, even on the new board vs the old board, the 192 on the TX1 is substantially lower than 1164 on the TX2 at idle.

Can you please confirm that the MSENC value displayed is the correct value?

Also, the 27.1 docs don’t reflect the new output from the new tegrastats. Can you tell me what the cpu output is supposed to reflect? I’m guessing the “off,off” strings reflect the Denver cores, mixed in with the output for the A57 cores.

Thanks a lot,

Ty

Hi Ty,
We will check and clarify this.

Hi,

I just want to let you know that RidgeRun has implemented specialized WebRTC Gstreamer elements which work pretty well with the OMX hardware accelerated plugins on the Tegra X1 and X2. Please check out our under development wiki about these elements: http://developer.ridgerun.com/wiki/index.php?title=GstWebRTC.

With these you can create your own Gstreamer application in one endpoint to interact with your Web application easily.

If you are interested about it or you have further questions please let us know.

Thanks.

Thanks, CarlosR92, I will PM you about this.

My goal is to utilize the Tegra hardware video encoder under WebRTC. Just like with the software encoder, I need the hardware encoding process to be dynamically manipulated in realtime under the direction of WebRTC. I’d like to do this as an extension to Google’s WebRTC source. This way, Tegra hardware acceleration should seamlessly work under Chromium, Firefox, and natively.

I have been exploring the Tegra Multimedia API to control the hardware encoder, as well as the OMX IL support, as provided by NVidia for GStreamer. Originally, I thought GStreamer might be a good way to go since OpenWebRTC was built on top of GStreamer and I could just use the GStreamer 1.0 libs that NVidia already provided. Unfortunately, that didn’t work out, so now I’m looking at building an extension to WebRTC directly. If I can achieve dynamic encoder control, that would be a preferred approach.

Thanks,

Ty

Hi Ty, please try tegrastats attached in #6. Disabled clocks will not be printed.

Thanks DaneLLL. It works as you describe. I ran the same scenario as described in #7. Here’s my new output:

RAM 1005/7854MB (lfb 1513x4MB) cpu [0%@1726,off,off,0%@1729,0%@1731,0%@1731] EMC 5%@1600 APE 150 GR3D 30%@140
RAM 1005/7854MB (lfb 1513x4MB) cpu [37%@965,off,off,27%@959,25%@960,40%@960] EMC 5%@1600 APE 150 GR3D 34%@140
RAM 1005/7854MB (lfb 1513x4MB) cpu [35%@959,off,off,31%@960,33%@959,39%@959] EMC 5%@1600 APE 150 GR3D 32%@140
RAM 1005/7854MB (lfb 1513x4MB) cpu [40%@806,off,off,30%@809,35%@813,27%@808] EMC 5%@1600 APE 150 GR3D 28%@140
RAM 1005/7854MB (lfb 1513x4MB) cpu [38%@805,off,off,31%@806,31%@805,37%@806] EMC 5%@1600 APE 150 GR3D 29%@140
RAM 1005/7854MB (lfb 1513x4MB) cpu [22%@959,off,off,35%@959,40%@960,39%@959] EMC 5%@1600 APE 150 GR3D 33%@140
RAM 1005/7854MB (lfb 1513x4MB) cpu [27%@805,off,off,31%@805,33%@959,42%@961] EMC 5%@1600 APE 150 GR3D 29%@140
RAM 1005/7854MB (lfb 1513x4MB) cpu [35%@806,off,off,29%@812,29%@808,42%@813] EMC 5%@1600 APE 150 GR3D 23%@140
RAM 1015/7854MB (lfb 1513x4MB) cpu [54%@1846,off,off,58%@1844,86%@1847,44%@1849] EMC 10%@1600 APE 150 MSENC 1164 GR3D 28%@140
RAM 1015/7854MB (lfb 1513x4MB) cpu [55%@1881,off,off,58%@1881,100%@1886,53%@1883] EMC 14%@1600 APE 150 MSENC 1164 GR3D 29%@140
RAM 1015/7854MB (lfb 1513x4MB) cpu [71%@1845,off,off,62%@1851,55%@1850,78%@1854] EMC 16%@1600 APE 150 MSENC 1164 GR3D 34%@140
RAM 1015/7854MB (lfb 1513x4MB) cpu [100%@1842,off,off,60%@1847,56%@1836,51%@1848] EMC 17%@1600 APE 150 MSENC 1164 GR3D 40%@140
RAM 1015/7854MB (lfb 1513x4MB) cpu [90%@1848,off,off,61%@1848,60%@1842,52%@1845] EMC 17%@1600 APE 150 MSENC 1164 GR3D 43%@140
RAM 1015/7854MB (lfb 1513x4MB) cpu [54%@1890,off,off,61%@1889,65%@1886,85%@1881] EMC 17%@1600 APE 150 MSENC 1164 GR3D 36%@140
RAM 1015/7854MB (lfb 1513x4MB) cpu [68%@1845,off,off,62%@1843,52%@1843,89%@1843] EMC 17%@1600 APE 150 MSENC 1164 GR3D 36%@140
RAM 1015/7854MB (lfb 1513x4MB) cpu [63%@1839,off,off,63%@1841,74%@1841,66%@1843] EMC 17%@1600 APE 150 MSENC 1164 GR3D 38%@140
RAM 1016/7854MB (lfb 1512x4MB) cpu [78%@1843,off,off,64%@1843,72%@1845,51%@1844] EMC 17%@1600 APE 150 MSENC 1164 GR3D 32%@140
RAM 1016/7854MB (lfb 1512x4MB) cpu [85%@1844,off,off,60%@1847,67%@1848,54%@1842] EMC 17%@1600 APE 150 MSENC 1164 GR3D 36%@140
RAM 1016/7854MB (lfb 1511x4MB) cpu [76%@1845,off,off,59%@1807,77%@1811,54%@1812] EMC 17%@1600 APE 150 MSENC 1164 GR3D 39%@140
RAM 1017/7854MB (lfb 1509x4MB) cpu [77%@1881,off,off,69%@1892,53%@1881,64%@1883] EMC 17%@1600 APE 150 MSENC 1164 GR3D 39%@140
RAM 1017/7854MB (lfb 1508x4MB) cpu [65%@1881,off,off,53%@1881,51%@1880,100%@1883] EMC 17%@1600 APE 150 MSENC 1164 GR3D 33%@140
RAM 1017/7854MB (lfb 1506x4MB) cpu [74%@1881,off,off,72%@1883,53%@1888,63%@1881] EMC 17%@1600 APE 150 MSENC 1164 GR3D 40%@140
RAM 1017/7854MB (lfb 1505x4MB) cpu [65%@1847,off,off,77%@1846,50%@1848,65%@1845] EMC 17%@1600 APE 150 MSENC 1164 GR3D 38%@140
RAM 1009/7854MB (lfb 1504x4MB) cpu [44%@961,off,off,74%@961,47%@961,48%@960] EMC 14%@1600 APE 150 MSENC 1164 GR3D 41%@140
RAM 1009/7854MB (lfb 1504x4MB) cpu [24%@959,off,off,47%@959,36%@959,33%@959] EMC 9%@1600 APE 150 GR3D 40%@140
RAM 1009/7854MB (lfb 1504x4MB) cpu [26%@806,off,off,40%@805,33%@806,38%@806] EMC 7%@1600 APE 150 GR3D 40%@140
RAM 1009/7854MB (lfb 1504x4MB) cpu [17%@961,off,off,35%@960,46%@960,38%@960] EMC 6%@1600 APE 150 GR3D 30%@140
RAM 1009/7854MB (lfb 1504x4MB) cpu [24%@960,off,off,40%@967,35%@962,41%@966] EMC 5%@1600 APE 150 GR3D 42%@140
RAM 1009/7854MB (lfb 1504x4MB) cpu [18%@959,off,off,40%@959,49%@967,30%@965] EMC 5%@1600 APE 150 GR3D 46%@140
RAM 1009/7854MB (lfb 1504x4MB) cpu [23%@959,off,off,46%@960,31%@962,40%@959] EMC 5%@1600 APE 150 GR3D 29%@140
RAM 1008/7854MB (lfb 1504x4MB) cpu [23%@806,off,off,38%@806,36%@811,41%@806] EMC 5%@1600 APE 150 GR3D 43%@140

"MSENC 1164" is included in the output only when the hardware accelerator is working.

Thanks for the fix,

Ty