Ethernet speed issue for 4K USB capture and streaming

Hi,

I’m facing issue with 4K30 USB capture and streaming. I’m trying to realize 4K30 capture -> encode -> stream. I see that the USB3.0 capture affects the throughput of the Ethernet, i.e when 4K30 is being captured Ethernet bandwidth reduces drastically, due to which I’m not able to achieve video streaming.

These are my observations:

When I run iperf this is the output

iperf -c 192.168.39.9 -u -b 1000M -i 1
------------------------------------------------------------
Client connecting to 192.168.39.9, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size:  208 KByte (default)
------------------------------------------------------------
[  3] local 192.168.49.12 port 42333 connected with 192.168.39.9 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  50.4 MBytes   423 Mbits/sec
[  3]  1.0- 2.0 sec  50.5 MBytes   424 Mbits/sec
[  3]  2.0- 3.0 sec  51.1 MBytes   429 Mbits/sec
:
:
[  3]  0.0-10.0 sec   505 MBytes   424 Mbits/sec
[  3] Sent 360266 datagrams
[  3] Server Report:
[  3]  0.0-10.2 sec   114 MBytes  93.8 Mbits/sec  13.993 ms 278688/360265 (77%)

When I run v4l2 capture as follows and then run iperf simultaneously the throughput reduces

v4l2-ctl --device /dev/video0 --set-fmt-video=width=3840,height=2160,pixelformat=NV12
v4l2-ctl --device /dev/video0 --stream-mmap  --stream-poll
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 31.37 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 30.50 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 30.36 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 30.25 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 30.20 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 30.18 fps
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 30.15 fps

Tegrastats -
RAM 1747/3995MB (lfb 303x4MB) cpu [89%,1%,1%,0%]@1734 EMC 14%@1600 AVP 1%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1747/3995MB (lfb 303x4MB) cpu [85%,0%,0%,0%]@1734 EMC 14%@1600 AVP 1%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1747/3995MB (lfb 303x4MB) cpu [86%,0%,1%,0%]@1734 EMC 14%@1600 AVP 1%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734

Now run iperf 
[  3] local 192.168.49.12 port 56749 connected with 192.168.39.9 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  1.27 MBytes  10.6 Mbits/sec
[  3]  1.0- 2.0 sec  1.12 MBytes  9.43 Mbits/sec
[  3]  2.0- 3.0 sec  1.66 MBytes  13.9 Mbits/sec
:
:
[  3]  9.0-10.0 sec  1.12 MBytes  9.38 Mbits/sec
[  3]  0.0-10.0 sec  14.1 MBytes  11.8 Mbits/sec
[  3] Sent 10056 datagrams
[  3] Server Report:
[  3]  0.0-10.0 sec  14.1 MBytes  11.8 Mbits/sec   0.736 ms 

And tegrastats output is 

RAM 1747/3995MB (lfb 303x4MB) cpu [100%,0%,1%,1%]@1734 EMC 14%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1747/3995MB (lfb 303x4MB) cpu [100%,0%,0%,4%]@1734 EMC 14%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734
RAM 1747/3995MB (lfb 303x4MB) cpu [100%,0%,2%,8%]@1734 EMC 14%@1600 AVP 0%@80 NVDEC 268 MSENC 268 GR3D 0%@998 EDP limit 1734

The ethernet throughput dropped from ~100Mpbs to ~10Mpbs.
I have run the jetson max clocks script as well, but no improvement.

I found out that ethernet controller on TX1 is RTL8153 which is also via USB3.0.

Can this be leading to this issue?
Is there a way to achieve 4K30 USB capture and streaming on TX1?

Hi zeitgeist,
Please try
https://devtalk.nvidia.com/default/topic/979635/jetson-tx1/ethernet-speed-increases-when-micro-usb-2-0-connector-is-connected/post/5041424/#5041424

Hi DaneLLL,

With this setting I’m seeing a slight improvement. Now the bandwidth is around 20 Mbps when capture is running.

iperf -c 192.168.39.9 -u -b 1000M -i 1
------------------------------------------------------------
Client connecting to 192.168.39.9, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size:  208 KByte (default)
------------------------------------------------------------
[  3] local 192.168.49.11 port 39531 connected with 192.168.39.9 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  2.42 MBytes  20.3 Mbits/sec
[  3]  1.0- 2.0 sec  2.41 MBytes  20.2 Mbits/sec
[  3]  2.0- 3.0 sec  2.28 MBytes  19.1 Mbits/sec
[  3]  3.0- 4.0 sec  2.62 MBytes  22.0 Mbits/sec
[  3]  4.0- 5.0 sec  3.04 MBytes  25.5 Mbits/sec
[  3]  5.0- 6.0 sec  3.97 MBytes  33.3 Mbits/sec
[  3]  6.0- 7.0 sec  3.52 MBytes  29.5 Mbits/sec
[  3]  7.0- 8.0 sec  3.87 MBytes  32.5 Mbits/sec
[  3]  8.0- 9.0 sec  3.04 MBytes  25.5 Mbits/sec
[  3]  9.0-10.0 sec  3.42 MBytes  28.7 Mbits/sec
[  3]  0.0-10.0 sec  30.6 MBytes  25.6 Mbits/sec
[  3] Sent 21807 datagrams
[  3] Server Report:
[  3]  0.0-10.0 sec  30.6 MBytes  25.6 Mbits/sec   2.076 ms    0/21806 (0%)

I want to stream at a higher rate than this…

Hi zeitgeist, we will simulate the case and try to reproduce the issue.

Hi zeitgeist,
Please try the patch attached.
0001-drivers-usb-xhci-raise-sclk-to-80-MHz.patch.txt (3.93 KB)

Hi DaneLLL

I don’t see any improvement in behavior even after applying the patches.

This could be because, file “drivers/usb/host/xhci-tegra.c” did not compile and “drivers/usb/host/xhci-tegra.o” is not generated.
Looks like “drivers/usb/host/Makefile” is missing “xhci-tegra.o”

How do we ensure the kernel is built with necessary changes?

/** FYI

After applying the patches, I executed the following commands to build the kernel and kernel modules

make clean
make mrproper
make tegra21_hdmi2csi_defconfig
make Image
make dtbs
make modules

**/

Hi zeitgeist, could you try tegra21_defconfig?

Hi DaneLLL,

Tried the change, still not compiling.

Below are the only files that are compiling:
Line 5083: CC drivers/usb/host/xhci.o
Line 5084: CC drivers/usb/host/xhci-mem.o
Line 5085: CC drivers/usb/host/xhci-ring.o
Line 5086: CC drivers/usb/host/xhci-hub.o
Line 5087: CC drivers/usb/host/xhci-dbg.o
Line 5088: CC drivers/usb/host/xhci-pci.o
Line 5089: CC drivers/usb/host/xhci-tegra-t210-padctl.o
Line 5090: LD drivers/usb/host/xhci-hcd.o

/**

make clean
make mrproper
make tegra21_defconfig
make Image
make dtbs
make modules

**/

Hi zeitgeist,
It is built in xhci.o

./drivers/usb/host/xhci.c:#include "xhci-tegra.c"

So you don’t observe any improvement in iperf with the patch? Is the camera detected in superspeed?
Could you share the result of ‘tegrastats’?

Hi Danelll,

I don’t observe any improvements with the patch.
The camera connected is USB 3.0 superspeed ( since we are able to do 4K30 capture and display via gstreamer which requires ~3Gbps throughput supported only on USB 3.0)

These are the tests I ran after applying patches and steps -

  1. Streaming without capture
iperf -c 172.20.34.23 -b 1000M -i 1 -t 600
WARNING: option -b implies udp testing
------------------------------------------------------------
Client connecting to 172.20.34.23, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size:  208 KByte (default)
------------------------------------------------------------
[  3] local 192.168.39.5 port 59996 connected with 172.20.34.23 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  65.2 MBytes   547 Mbits/sec
[  3]  1.0- 2.0 sec  65.9 MBytes   552 Mbits/sec
[  3]  2.0- 3.0 sec  65.4 MBytes   549 Mbits/sec
[  3]  3.0- 4.0 sec  65.6 MBytes   550 Mbits/sec
[  3]  4.0- 5.0 sec  65.4 MBytes   548 Mbits/sec
[  3]  5.0- 6.0 sec  65.4 MBytes   549 Mbits/sec
[  3]  6.0- 7.0 sec  65.7 MBytes   551 Mbits/sec
[  3]  7.0- 8.0 sec  65.8 MBytes   552 Mbits/sec
[  3]  8.0- 9.0 sec  65.8 MBytes   552 Mbits/sec
[  3]  9.0-10.0 sec  65.8 MBytes   552 Mbits/sec

Tegrastats:

RAM 867/3995MB (lfb 630x4MB) cpu [70%,100%,2%,7%]@1734 EMC 3%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 867/3995MB (lfb 630x4MB) cpu [74%,99%,11%,4%]@1734 EMC 3%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 867/3995MB (lfb 630x4MB) cpu [79%,100%,7%,3%]@1734 EMC 3%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 867/3995MB (lfb 630x4MB) cpu [75%,100%,2%,9%]@1734 EMC 3%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 867/3995MB (lfb 630x4MB) cpu [65%,99%,3%,9%]@1734 EMC 3%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 1%@998 EDP limit 1734
RAM 867/3995MB (lfb 630x4MB) cpu [72%,100%,4%,6%]@1734 EMC 3%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 1%@998 EDP limit 1734
RAM 867/3995MB (lfb 630x4MB) cpu [77%,100%,2%,9%]@1734 EMC 3%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 1%@998 EDP limit 1734
RAM 867/3995MB (lfb 630x4MB) cpu [79%,99%,9%,9%]@1734 EMC 3%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 1%@998 EDP limit 1734
RAM 867/3995MB (lfb 630x4MB) cpu [80%,100%,1%,10%]@1734 EMC 3%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 1%@998 EDP limit 1734
RAM 867/3995MB (lfb 630x4MB) cpu [71%,100%,3%,8%]@1734 EMC 3%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 2%@998 EDP limit 1734
  1. Streaming with capture
iperf -c 172.20.34.23 -b 1000M -i 1 -t 600
WARNING: option -b implies udp testing
------------------------------------------------------------
Client connecting to 172.20.34.23, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size:  208 KByte (default)
------------------------------------------------------------
[  3] local 192.168.39.5 port 34488 connected with 172.20.34.23 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  6.95 MBytes  58.3 Mbits/sec
[  3]  1.0- 2.0 sec  2.50 MBytes  20.9 Mbits/sec
[  3]  2.0- 3.0 sec  4.36 MBytes  36.6 Mbits/sec
[  3]  3.0- 4.0 sec  2.34 MBytes  19.7 Mbits/sec
[  3]  4.0- 5.0 sec  2.92 MBytes  24.5 Mbits/sec
[  3]  5.0- 6.0 sec  4.55 MBytes  38.2 Mbits/sec
[  3]  6.0- 7.0 sec  2.31 MBytes  19.4 Mbits/sec
[  3]  7.0- 8.0 sec  5.66 MBytes  47.5 Mbits/sec
[  3]  8.0- 9.0 sec  3.78 MBytes  31.7 Mbits/sec


Tegrastats:

RAM 915/3995MB (lfb 626x4MB) cpu [0%,0%,0%,0%]@1734 EMC 8%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 915/3995MB (lfb 626x4MB) cpu [99%,7%,0%,4%]@1734 EMC 8%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 915/3995MB (lfb 626x4MB) cpu [100%,0%,5%,0%]@1734 EMC 8%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 915/3995MB (lfb 626x4MB) cpu [100%,0%,0%,4%]@1734 EMC 8%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 915/3995MB (lfb 626x4MB) cpu [100%,0%,0%,1%]@1734 EMC 8%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 915/3995MB (lfb 626x4MB) cpu [100%,0%,0%,0%]@1734 EMC 8%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 915/3995MB (lfb 626x4MB) cpu [100%,0%,0%,1%]@1734 EMC 8%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 915/3995MB (lfb 626x4MB) cpu [100%,0%,0%,0%]@1734 EMC 8%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 915/3995MB (lfb 626x4MB) cpu [100%,0%,0%,2%]@1734 EMC 8%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 915/3995MB (lfb 626x4MB) cpu [100%,0%,0%,2%]@1734 EMC 8%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 915/3995MB (lfb 626x4MB) cpu [100%,3%,1%,5%]@1734 EMC 8%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 915/3995MB (lfb 626x4MB) cpu [100%,2%,2%,1%]@1734 EMC 8%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 915/3995MB (lfb 626x4MB) cpu [100%,3%,3%,2%]@1734 EMC 8%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 915/3995MB (lfb 626x4MB) cpu [100%,0%,0%,0%]@1734 EMC 8%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 915/3995MB (lfb 626x4MB) cpu [100%,0%,0%,3%]@1734 EMC 8%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 915/3995MB (lfb 626x4MB) cpu [100%,7%,0%,3%]@1734 EMC 8%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 915/3995MB (lfb 626x4MB) cpu [100%,4%,0%,8%]@1734 EMC 8%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734
RAM 915/3995MB (lfb 626x4MB) cpu [100%,3%,0%,0%]@1734 EMC 8%@1600 AVP 0%@408 NVDEC 192 MSENC 192 GR3D 0%@998 EDP limit 1734

hi zeitgeist, we shall have this fixed in next release.

I am always grateful for your help.
And I am waiting for this next release.

I am now trying to build a streaming server of 4K video with Jetson TX 2 but I noticed that it took several hundred microseconds sometimes even though the time to execute the sendto() function was several microseconds it was.
Therefore, although throughput of about 80 Mbps is obtained when there is no problem, throughput is suddenly suppressed to about 15 to 20 Mbps, and a large amount of RTP packets stagnate in the transmission waiting queue I will.

For some bugs already, fix patches are distributed here in the Developer Forum.
When patches are applied individually, I am glad that all the troubles I have are resolved, but unfortunately I do not know how to combine patches.

In MMAPI related, Jetson TX 1 and Jetson TX 2 have a common problem, the progress of my project is stagnating.
(1) Ethernet throughput when capturing 4K images via USB
(2) 4K 30P Encoding capability: HW_PRESET_MODE
(3) Output of illegal timestamp of encoder / decoder
etc

Is there a list of defects scheduled to be fixed in the next release?
Also, is it possible to announce the next scheduled release date?

Hi mynaemi,
The nest release is planned to be in early July.

(1)(2) are fixed.

For (3), please share which the topic is.

Hi DaneLLL,

I am grateful for your prompt response.

Is the Ethernet throughput issue (1) related below ?
https://devtalk.nvidia.com/default/topic/996739/

Timestamp problems (3) are addressed in the following topics on decoders.
https://devtalk.nvidia.com/default/topic/1008111/jetson-tx1/-mmapi-some-questions-about-videodecoder-timestamp-handling/

What I am faced is the timestamp output of the encoder.
Even if you input the captured timestamp to the encoder, the timestamp output together with H.264 ES NAL will always be the same value. As a countermeasure, YUV images are input to the encoder, timestamps are saved in another prepared queue, and replace when NAL is multiplexed in RTP packets.

I have some other issues other than the above.

(4) O_NONBLOCKING mode
At present, encoders, converters, and decoders can not be started in nonblocking mode. Therefore, when calling qbuff () and entering data, the thread is kept waiting until the processing is completed, and the processing performance is degraded.
https://devtalk.nvidia.com/default/topic/987024/jetson-tx1/question-about-v4l2-api-for-encode-of-tx1/post/5132125/#5132125
https://devtalk.nvidia.com/default/topic/1006857/jetson-tx1/-mmapi-under-what-conditions-lsquo-conv-gt-capture_plane-qbuffer-rsquo-will-block/

(5) Too Large NAL
Sometimes a very large P-picture NAL is output for the set bit rate when the rate is changed frequently via the NvVideoEncoder::setBitrate() API to the encoder.
There seems to be some restriction on this API. (For example, the shortest calling period)

There may be unknown defects that I have not encountered yet, but if there is a list of defects that will be fixed in the next release, I believe that I can check my code for any impact.

Best Regards,

Hi mynaemi,

(3) The next release will not have changes about timestamp.
Please refer to
https://devtalk.nvidia.com/default/topic/1002773/jetson-tx2/about-the-timestamp-of-video-encoder/post/5124401/#5124401

(4) Issues in the two posts are fixed. But we don’t check O_NONBLOCKING, it always runs in non-blocking mode.

(5) Please start a new post and give steps to reproduce it. But if it is confirmed an issue, it cannot catch the next release.

Hi DaneLLL,

The next release is planned to be in early July.

It was into the late July.
Has the plan changed at all?

We are at the last mile.

Here is the test result on r28.1

ubuntu@tegra-ubuntu:~$ gst-launch-1.0 v4l2src device=/dev/video1 num-buffers=600 ! 'video/x-raw,width=1920,height=1080,format=I420' ! nvoverlaysink &
[1] 2068
ubuntu@tegra-ubuntu:~$ Setting pipeline to PAUSED ...
Pipeline is live and does not need PREROLL ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock

ubuntu@tegra-ubuntu:~$ iperf -c 10.19.106.10 -u -b 1000M -i 1                   ------------------------------------------------------------
Client connecting to 10.19.106.10, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size:  224 KByte (default)
------------------------------------------------------------
[  3] local 10.19.106.179 port 59774 connected with 10.19.106.10 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  86.8 MBytes   728 Mbits/sec
[  3]  1.0- 2.0 sec  90.5 MBytes   759 Mbits/sec
[  3]  2.0- 3.0 sec  89.7 MBytes   752 Mbits/sec
[  3]  3.0- 4.0 sec  89.5 MBytes   750 Mbits/sec
[  3]  4.0- 5.0 sec  87.1 MBytes   730 Mbits/sec
[  3]  5.0- 6.0 sec  87.2 MBytes   731 Mbits/sec
[  3]  6.0- 7.0 sec  87.4 MBytes   734 Mbits/sec
[  3]  7.0- 8.0 sec  87.6 MBytes   735 Mbits/sec
[  3]  8.0- 9.0 sec  86.6 MBytes   726 Mbits/sec
[  3]  9.0-10.0 sec  87.6 MBytes   735 Mbits/sec
[  3]  0.0-10.0 sec   880 MBytes   738 Mbits/sec
[  3] Sent 627712 datagrams
[  3] Server Report:
[  3]  0.0-10.4 sec   119 MBytes  95.7 Mbits/sec   0.226 ms 543002/627711 (87%)
[  3]  0.0-10.4 sec  1 datagrams received out-of-order
ubuntu@tegra-ubuntu:~$ Got EOS from element "pipeline0".
Execution ended after 0:00:22.784606034
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Setting pipeline to NULL ...
Freeing pipeline ...

[1]+  Done                    gst-launch-1.0 v4l2src device=/dev/video1 num-buffers=600 ! 'video/x-raw,width=1920,height=1080,format=I420' ! nvoverlaysink
ubuntu@tegra-ubuntu:~$

Since we don’t have 4K usbcam, we run it with Logitech c930e which can go 1080p30.

Hi DaneLLL,

I appreciate your support.

Now I’ve installed the R28.1 on both TX1 and TX2.
And I’ll test it with the SystemProfiler.