WebRTC - low performances with Nvidia encoder

Hello me and my team are trying to port a C++ application from Intel UP2 to Nvidia Jetson Nano.
Our app use WebRTC to send a video stream to another device. To do so, the video datastream have to be encoded in the H264 format.
We use the Nvidia Encoder Factory class in the WebRTC library to compress the raw video stream from a camera into the H264 format.

    Basically we have two issues:

    1) The framerate is too low, I'm aiming for 30 fps in 1920x1080p resolution.
        On intel board I can have this performance, but on Nano it's less than 10 fps.
        Inside the WebRTC package provided by Nvidia there is a test application for it, called `video_loopback`, and I can see the framerate is not enough.
        Also by using different USB cameras I noticed a slightly better framerate sometimes but still not enough.

    2) There are several parameters for the H264 Encoder class: `Profile`, `Level`, `RTP Packetization mode`.
       The combination of various parameters make a lot of possible configurations, but only one seems to be supported by the Nvidia encoder class in webRTC (packetization-mode = 0, profile-level-id = 42e01f).
    However many Android devices only support `packetization-mode=1`, it means I cannot establish a call between the Nano and an Android device, which is one of the key features of our application.

    I believe these 2 issues could be fixed by tweaking the encoder code in the Webrtc library, unfortunately I could not find the sources anywhere.
    I'd like to have access to these sources, or at least discuss with the person who developed it so I don't have to code the whole encoding logic.

    Here is also some technical questions which could help me and my team:
    - Which type of camera should we use for getting the best performances with the Nvidia Encoder (interface, model, capability)?
    - When capturing the video, does the Encoder need a specific video format as input ? (NV12, YUY2, MPEG, I420, …)
    - From Gstreamer I noticed very good results in terms of framerate, but I wonder why the WebRTC class is lower, did someone managed to transmit a 30fps stream with it?

Hi,
We have RTC package in
https://developer.nvidia.com/embedded/dlc/r32-3-1_Release_v1.0/t186ref_release_aarch64/WebRTC_R32.3.1_aarch64.tbz2

Not sure if you use this one. If not, please download it and floow README to give it a try.

Also we support two software frameworks with hardware acceleration: gstreamer and tegra_multimedia_api. Please also take a look at the documents:
https://developer.nvidia.com/embedded/dlc/l4t-accelerated-gstreamer-guide-32-2
https://docs.nvidia.com/jetson/archives/l4t-multimedia-archived/l4t-multimedia-3231/index.html

Hi,

Yes we use this exact WebRTC package, but it only contains the header files and the compiled library, not the sources of the Nvidia Encoder class which I’m looking for, my goal is to modify/improve it.

About Gstreamer, I managed to have good result with it independently of webRTC, but that’s not what I need.
And using the multimedia API, is indeed useful if I redo the entire class from scratch but accessing the sources of what is already done would be more time saving.

Hi,
Please execute sudo jetson_clocks and check if there is improvement. If the result still does not meet your requirement, please share steps so that we can reproduce your observation and do further investigation.

It is an enhancement to WebRTC package and may take some time in development. If you can move your usecase to leverage gstreamer or tegra_multimedia_api, it would be great.

Running jetson_clocks did not improve the framerate unfortunately.
Well I’ll use the multimedia for my usecase and try to reach better results, thanks for the suggestions however.

Basically just run the ‘video_loopback’ application contained in the Webrtc Nvidia API package, with an usb Camera on Jetson Nano, you should observe that it’s not even 20 fps.

Hi,
We will try video_loopback with E-Con CU135 on r32.4.2 and share the result.

For information, please share which USB camera you use and the resolution.

Thanks for checking, the model we use is this one:
http://www.webcamerausb.com/elp-high-speed-120fps-pcb-usb20-webcam-board-2-mega-pixels-1080p-ov2710-cmos-camera-module-with-21mm-lens-elpusbfhd01ml21-p-78.html

Hi,

Please share the resolution you run in this test. 1920x1080 or 1280x720? Thanks.

1280x720

Hi Tangfrere,

How do you check the fps when running video_loopback?
We tried to run video_loopback with README command, but can’t see the fps output.

$ ./video_loopback --codec H264 --width 1280 --height 720 --capture_device_index 0

Yes there is nothing indicating fps, but visually I can clearly notice it’s not 30 fps, you can try to compare with a a camera on a desktop machine.

Hi,
The low performance is due to a known limitation. Please check development guide.

We will check to improve it by using NvBufferTransform() to leverage hardware VIC engine.

I see, so the input video format is the bottleneck.

Alright, thanks.

Hi Tangfrere,

List our test result with E-Con CU135 usb camera for you reference:

$ ./video_loopback --width 1280 --height 720 --capture_device_index 10 --codec H264 --fps 30 --duration 10
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 4
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
H264: Profile = 66, Level = 0
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 4
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
H264: Profile = 66, Level = 0
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 4
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
H264: Profile = 66, Level = 0
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 4
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
H264: Profile = 66, Level = 0
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 4
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
H264: Profile = 66, Level = 0
Farewell, sweet Concorde!
RESULT psnr: video= {24.594748,2.2867353} dB
RESULT ssim: video= {0.92835946,0.016521056} score
RESULT sender_time: video= {6.6736111,3.2624332} ms
RESULT receiver_time: video= {16.104167,4.9563634} ms
RESULT network_time: video= {4.9826389,0.66904055} ms
RESULT total_delay_incl_network: video= {27.760417,5.0841764} ms
RESULT time_between_rendered_frames: video= {34.045296,6.7303492} ms
RESULT encode_frame_rate: video= {27.181818,8.4618737} fps
RESULT encode_time: video= {1,0} ms
RESULT media_bitrate: video= {679319.09,251986.61} bps
RESULT fec_bitrate: video= {0,0} bps
RESULT send_bandwidth: video= {1374213.3,88051.633} bps
RESULT time_between_freezes: video= {9771,0} ms
RESULT pixels_per_frame: video= {523636.36,272894.44} px
RESULT min_psnr: video= 16.250608 dB
RESULT decode_time: video= {2.1818182,1.1922615} ms
RESULT dropped_frames: video= 12 frames
RESULT cpu_usage: video= 21.066234 %