Hi,
I have been measuring the Ethernet speed of Jetson TX1 developement board and noticed some strange things.
In short, when I normally measure the speed, the result is about 650-800 Mbits/sec and not stable. But if I connect a mouse to the micro USB 2.0 connector, the Ethernet speed increases to 940 Mbits/sec, and becomes stable.
I am using iperf3 for measurement and tested both 32bit L4T24.1 and 64bit L4T24.2.1, but gives me the same result.
If mouse is NOT connected to micro USB 2.0:
iperf3 -c 192.168.1.86 -R
Connecting to host 192.168.1.86, port 5201
Reverse mode, remote host 192.168.1.86 is sending
[ 4] local 192.168.1.70 port 51603 connected to 192.168.1.86 port 5201
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 63.3 MBytes 531 Mbits/sec
[ 4] 1.00-2.00 sec 84.7 MBytes 711 Mbits/sec
[ 4] 2.00-3.00 sec 89.4 MBytes 750 Mbits/sec
[ 4] 3.00-4.00 sec 87.1 MBytes 730 Mbits/sec
[ 4] 4.00-5.00 sec 78.4 MBytes 657 Mbits/sec
[ 4] 5.00-6.00 sec 78.3 MBytes 658 Mbits/sec
[ 4] 6.00-7.00 sec 79.4 MBytes 666 Mbits/sec
[ 4] 7.00-8.00 sec 85.2 MBytes 715 Mbits/sec
[ 4] 8.00-9.00 sec 77.0 MBytes 645 Mbits/sec
[ 4] 9.00-10.00 sec 76.7 MBytes 643 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 800 MBytes 671 Mbits/sec 0 sender
[ 4] 0.00-10.00 sec 800 MBytes 671 Mbits/sec receiver
iperf Done.
If mouse is connected to micro USB 2.0:
iperf3 -c 192.168.1.86 -R
Connecting to host 192.168.1.86, port 5201
Reverse mode, remote host 192.168.1.86 is sending
[ 4] local 192.168.1.70 port 51605 connected to 192.168.1.86 port 5201
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 102 MBytes 852 Mbits/sec
[ 4] 1.00-2.00 sec 112 MBytes 944 Mbits/sec
[ 4] 2.00-3.00 sec 113 MBytes 945 Mbits/sec
[ 4] 3.00-4.00 sec 113 MBytes 944 Mbits/sec
[ 4] 4.00-5.00 sec 113 MBytes 943 Mbits/sec
[ 4] 5.00-6.00 sec 113 MBytes 945 Mbits/sec
[ 4] 6.00-7.00 sec 113 MBytes 945 Mbits/sec
[ 4] 7.00-8.00 sec 113 MBytes 946 Mbits/sec
[ 4] 8.00-9.00 sec 113 MBytes 946 Mbits/sec
[ 4] 9.00-10.00 sec 113 MBytes 945 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 1.09 GBytes 936 Mbits/sec 0 sender
[ 4] 0.00-10.00 sec 1.09 GBytes 936 Mbits/sec receiver
iperf Done.
I know that Ethernet controller and micro USB 2.0 port use the same USB controller of the TX1, but dont know why this problem is happening.
I would be very grateful, if someone could help me how to use Ethernet with full speed without connecting a device to the micro USB 2.0 port.
Thank you in advance!
Mine went from 772Mbit/s to 906Mbit/s as well. What gives? It seems keyboards or mice have similar effects. I’m on R23.2 with MTU at 1500.
I have not tested, but I’m curious if there is any difference in the boost when just a HUB is added, versus just a keyboard or mouse, versus a keyboard or mouse through a HUB. I wonder because there are several drivers involved. Hardware drivers all compete for CPU0, and I’d suspect any difference is in driver behavior.
With a USB 2.0 hub plugged into the OTG it drops to ~600Mbit/s. /proc/interrupts has an overwhelming large number of interrupts on CPU0 instead of any other.
All hardware IRQs must go through CPU0 on this architecture. Software IRQs can go anywhere. When interrupts occur fast enough that one driver can’t finish and allow the next driver to immediately run you begin to see interrupt starvation. IRQ starvation is why IRQ servicing only on CPU0 starves faster than a desktop equivalent which distributes hardware IRQ onto multiple cores, and why minimal time in hardware IRQ is important…shifting as much work as possible into a software IRQ (when possible) is a good thing. The nature of how one hardware driver behaves in holding CPU0 can effect other hardware drivers once starvation begins.
What you describe about increasing core 0 interrupts simply implies that hardware driver servicing is stressing CPU0, and other software divides among cores. The big question is how the rate of IRQ hits on CPU0 changes with and without the mouse/keyboard (I’m assuming that if mouse/keyboard changes ethernet it is because of competition for hardware IRQ time, but CPU0 will be expected to have more interrupts than other cores).
What can you find out about IRQ activity on CPU0 with versus without a USB2 device improving ethernet throughput? Does the USB mouse or keyboard still help if it goes through the HUB? The above sounds like you tested with a solo HUB, but not with a mouse/keyboard through a HUB versus mouse/keyboard direct without a HUB. I’d like to know if an intervening HUB changes things, as it changes the enumeration process. This would help distinguish between USB driver and HID driver influence. Right now I’m not where I can test anything myself.
Unfortunately, I cant test the board until Monday.
Btw, I tested this phenomenon with several devices, too, and some devices make the boost on Ethernet, and some dont. The funny thing is, if I connect only the USB A type cable to the port (without any device), it makes a boost for one second, but if I connect a USB B type cable (flashing cable), than it doesnt make anything at all. The only difference is the ID pin of the cable for OTG. ID pin is connected to GND at A type, and floating at B type. I dont know if this takes us closer to the solution, but thought it is worth mentioning.
I will test a USB HUB on Monday, and will post the result.
OTG pin sense can cause the kernel to switch drivers, or at least to disable the type-A host function if the cable is type-B device and device mode does not have a driver.
I have tested connecting a USB HUB to the micro USB 2.0 port.
When I connect a mouse via a USB HUB I get the Ethernet boost (940Mbit/s), but when I connect only the HUB without a mouse, I get a boost only less for a second right after connection and then speed drops to the slow speed experienced before (650-800Mbits/sec).
As for the interrupts, most of the interrupts occur on the “GIC tegra-xhci:usb9” line. I get about 9400 interrupts/sec on this line when the speed is slow, and get 10800 interrupts/sec when it is fast. But it is no surprise, because if the speed is faster, obviously more interrupts occur on this line, because of the Ethernet controller.
The second most frequent interrupt is the “GIC tegra210_timer0”, which occurs 75-100 times in a second. The other interrupts occur less then 5 times in a second. I suppose these are not relevant values.
Seeing these values, I think slow Ethernet speed (when nothing is connected to the micro USB 2.0) cant be explained by the overload of interrupts on the CPU0 core. In fact, when a mouse is connected and Ethernet speed is fast, more interrupts occur.
By the way, I tested TK1, too, and the Ethernet speed is very stable (940Mbit/s), and not related to the USB micro port. However, in the case of TK1, Ethernet controller is not connected to the USB peripheral of the Tegra.
I would be very glad if NVIDIA would come up with a good idea to solve this speed problem…
USB3 does require a lot of interrupts if IRQ calls do not aggregate when possible (USB3 HUBs are in some way kind of a silly concept if you intend to plug in multiple USB3 devices…typically one device can consume most of the bandwidth…if transfer size per interrupt is constant, then you need more IRQ operations for faster transfer). Something which may validate this test as being IRQ starvation is that when an older USB1.1 device (which includes a USB mouse and keyboard) is used, then the system falls back to slower speeds…and slower IRQ rates. There is a strong chance that ethernet is undergoing some form of IRQ starvation when USB3 hits at max speed as competition to CPU0. If this is the case I might suspect that any USB1.1 device would improve ethernet due to slower IRQ; a USB2 device might help as well, though the effect would not be nearly as dramatic.
What I find curious is that the USB3 driver would be setting off so many interrupts if nothing USB3 is actually connected. I wonder if, when nothing is connected (other than the root HUB which is integral to the board), the IRQ rate from USB3 is still high? I would think that this would not be necessary when waiting for devices to enumerate, but would possibly be overlooked (perhaps it is part of dynamic performance tuning which could change). I do not know how the ethernet is wired within the TX1 module itself, but in the end it all has to meet at CPU0.
The evidence you found where a HUB without a mouse would briefly boost ethernet is also consistent with IRQ starvation. During enumeration USB3 drivers have no devices to talk to. I’d really like to see a real time graph of IRQ request to driver servicing times for both ethernet and USB as the speeds on USB are altered (looking for IRQ collisions and latency).
One possible aid would be if more buffer is supplied to the ethernet. There are some kernel command line options for this, although I don’t recall offhand what those options are. Sometimes those options are used just because of jumbo frames, but there is no reason the same buffering could not be used when the problem is from competition outside of the ethernet. Here’s an interesting article on ethernet tuning, but I have not tested any of this:
[url]https://www.cyberciti.biz/faq/linux-tcp-tuning/[/url]
I haven’t connected any USB3 device when I did any of the tests above.
The high interrupt frequency on “GIC tegra-xhci:usb9” (10000 interrupts/sec) is measured when I was executing iperf speed test. I think it is high because interrupts of the Ethernet controller is also connected to the “GIC tegra-xhci:usb9” line. I suppose xhci does not necessarily mean USB3. xHCI can also handle USB2 and USB1, too. And as I mentioned before, Ethernet controller is connected to USB 2.0 peripheral of the Jetson TX1:
[url]https://developer.nvidia.com/embedded/dlc/jetson-tx1-oem-product-design-guide[/url]
In other words, if I stop the iperf speed test, this interrupt frequency drops to around 50-100 interrupts/sec.
Thanks for the tuning tips, too! I have tested similar things before, but unfortunately it didnt help:
[url]http://elinux.org/Jetson/Network_Adapters[/url]
The other thing I wanted to mention is, if I execute iperf speed test for the first time after rebooting the system, I get the 940 Mbit/sec high speed result, even if I dont connect anything to the OTG USB port. But if I execute the test for the second time, the speed always drops to the slower 650-800Mbits/sec.
I really appreciate any help! :)
I see on that document that both USB1_DP/USB1_DM are tied to ethernet; also listed as connected are PEX_TX6_P/PEX_TX6_M (and the RX counterpart). I don’t know from that document if that means there is a choice between USB1 and PEX6, or if it means both USB1 and PEX6 are both used in the module.
Regardless, I can only see the performance drop because of competition between USB and ethernet drivers, which in turn would mean IRQs fighting for CPU0. However, this may explain why USB would be triggering its IRQ even with no USB device attached. It’s just speculation, but perhaps the ethernet device’s USB-related wiring is fooling the system into believing a USB IRQ needs to trigger…but a real USB1.1 device would back down the IRQ rate. IRQ starvation of the ethernet would not care if competing USB IRQ is for a needless IRQ on USB with no USB device attached. I wonder if there is needless IRQ operation going on for USB when no USB device is attached…if so, then this is something that could be fixed by avoiding unneeded interrupts when there isn’t really a USB device attached.
I have removed every unnecessary device (PCI card, USB devices, Ethernet Cable). Under these conditions I get about 30 interrupts/sec on the tegra-xhci:usb1 line. (If I only connect Ethernet cable, it increases to 70 interrupts/sec, and if I execute the speed test it increases to 10000 interrupts/sec). I dont know if these values are normal or not.
Anyway, if I list up the USB devices, I get the following output:
ubuntu@jetson-3dm1:~$ lsusb -t
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=tegra-xhci/4p, 5000M
|__ Port 1: Dev 2, If 0, Class=Vendor Specific Class, Driver=r8152, 5000M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=tegra-xhci/5p, 480M
ubuntu@jetson-3dm1:~$ lsusb
Bus 002 Device 002: ID 0955:09ff NVidia Corp.
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Which means Ethernet controller (r8152) shows up as a USB device.
Btw, I understand you suspect false USB interrupts and competition for CPU0 between Ethernet and USB, but dont know how to confirm or fix that…
This is a difficult problem since it isn’t a bug in the usual sense, and it isn’t just the one device driver…it’s about interaction between devices with only one core to handle the interraction.
If you test further with everything removed, but ethernet running its test (10k interrupts/sec), can you measure how that rate changes if a USB1.1 device is connected, e.g., mouse or keyboard plugged directly into the connector? I’m thinking that if IRQ rate goes up on ethernet at the moment the USB1.1 device is plugged in, then perhaps drivers related to the USB1.1 insertion effect could be optimized to reduce load when no device is inserted by whatever mechanism the USB1.1 insertion affects ethernet.
Btw, are you using any of the performance modes? See:
[url]http://elinux.org/Jetson/TX1_Controlling_Performance[/url]
Does MTU affect interrupt pressure on CPU0 and generally ethernet devices? By changing it to 7750, I got a smaller increase (only about 30MBit/sec).
I don’t know the details of the ethernet hardware or driver, but I would guess that if the hardware is able to buffer more prior to issuing an interrupt (or to collect more data while waiting), then this would reduce pressure to service an IRQ. Higher MTU would probably help up to the point where CPU0 has to get involved because of a full buffer on the device. If DMA gets involved in a way that does not require CPU0 I’m sure performance would go up drastically.
In terms of ethernet stats shown via “ifconfig”, it would be interesting to see if increasing IRQ pressure causes drops or overruns, e.g., running the ethernet performance test continuously, followed by adding in faster USB devices which are intended to purposely compete for IRQ time (some sort of USB2 or USB3 device which wouldn’t have significant bottlenecks…I think external hard drives might have bottlenecks, something like external USB hidef audio devices would be a really good test because they operate in isochronous mode…streaming devices in general would be a good test since they more or less must consume IRQ time slices on a regular basis without pause).
This is the results of a test I did using a USB mouse:
Conditions / Connected Devices Interrupt Rate on "tegra-xhci:usb1"
---------------------------------------------------------------------
nothing is connected 32 int/sec
mouse 32 int/sec
Ethernet cable 70 int/sec
Ethernet cable + mouse 70 int/sec
Ethernet Speed Test 8700 int/sec (570 Mbit/sec)
Ethernet Speed Test + mouse 11800 int/sec (940 Mbit/sec)
When I connected the mouse during the speed test, interrupt rate went from 8700 to 11800, but I don’t know precisely how quickly.
Btw, when I moved the mouse fast, interrupt on “tegra-xhci:usb1” dropped a little, and interrupts on “tegra-otg, tegra-udc, ehci_hcd:usb3”, “host_syncpt” and “gk20a_stall” were generated. Also the Ethernet speed dropped a little while moving the mouse (850 Mbit/sec).
As far as I know, it doesn’t really matter whether I connect a USB1.1 or USB2.0 device to that port, it makes the boost either way.
I did tried the performance tuning, too, but it didn’t make any difference for me.
Finally, I have updated the Realtek driver to the latest version, but of course it didn’t make any difference in behavior.
At this point I’m not sure what would be needed to improve performance. Some profiling within the tegra-xhci might be needed. The evidence is fairly solid (and normal) that two USB devices are competing for driver time on a single core…the question becomes one of rearranging how the interrupts are triggered so that ethernet and non-ethernet USB devices play more nicely together (and especially to improve ethernet when no other USB devices are around).
It probably isn’t the best workaround, but you could possibly use a PCIe ethernet NIC. Another possibility is using a higher priority on any user space app dependent on ethernet could help (see “man nice” and “man renice”…a nice level of -1 or -2 could perhaps influence underlying ethernet scheduling…see what happens when the performance test program runs at nice -1 or nice -2).
Thanks for the idea, but unfortunately changing the process priority had no effect at all on Ethernet performance.
I can accept that if a USB device is connected, Ethernet speed drops because of interrupts competing for the CPU time, but I think it is a problem if USB takes so much resource away when USB device is not connected at all.
I am planning to take a look at the source code, but dont know where to look around. What do you think, should I look at the tegra-xhci source file maybe? (drivers/usb/hos/xhci-tegra.c)
That’s where I’d start. It could be a hotplug layer looking for devices as well. Basically you’d be looking for what triggers the xusb driver when no devices are attached at the USB port, versus what happens when something like a keyboard is attached. The ethernet is going to trigger xusb, but a big clue is to see what happens when a keyboard is connected…and see if it displaces something (I agree, having lots of extra interrupts when nothing is attached should not happen).
Hi,
Because sclk raises when micro usb plugged, it affects the ethernet speed as well.
If you change it to maximum of clock, it should be more stable no matter connecting usb or not.