10GbE PCIe Card Behaves like 1GbE Card

Hi:

I have installed a 10Gb Ethernet card in the PCIe slot on my Xavier. I noticed that the diagnostic programs for the device I am talking to showed a lot of packets being dropped when running at speeds above 1Gb/sec. Netstat confirms the dropped packets on the interface. The system seems to have identified the card as 10Gb but something seems to still be bottlenecked at 1Gb. We tried using a second card out of a deskside machine that we know works correctly & the symptoms on the Xavier remained the same. We are running Jetpack 4.1 downloaded last week. Any insights would be greatly appreciated.

Thanks!

  • JLW

What does your full output show for:

ifconfig
route

For whichever device the 10G NIC is, e.g., maybe it is “eth1”:

ethtool eth1

Is your cable length long? How close is it to the switch?

If you run this first, do you get an improvement?

sudo nvpmodel -m 0
sudo ~ubuntu/jetson_clocks.sh

Hi linuxdev:

The following results for the various commands were obtained when the machine was booted this AM. There are two 10GB interfaces (eth1 & 2). The cables are 3’ and have been successfully used on another system. The connection is pt-to-pt to the device we’re trying to talk to (there is no switch). Changing the power model didn’t seem to help much. We went from losing about 2/3 of the packets to a little over half.

Thanks very much for helping me!

  • JLW

ifconfig output:

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 128.112.3.3 netmask 255.255.0.0 broadcast 128.112.255.255
inet6 fe80::f97f:4b79:ec64:cea7 prefixlen 64 scopeid 0x20
ether 00:04:4b:cb:9b:a5 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 59 bytes 6200 (6.2 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 40

eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 192.168.30.1 netmask 255.255.255.0 broadcast 192.168.30.255
inet6 fe80::1e3:fc5f:89f3:c358 prefixlen 64 scopeid 0x20
ether 90:e2:ba:f2:1c:18 txqueuelen 1000 (Ethernet)
RX packets 13 bytes 780 (780.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 59 bytes 6266 (6.2 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

eth2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 192.168.40.1 netmask 255.255.255.0 broadcast 192.168.40.255
inet6 fe80::79dd:b1f:cdcb:4036 prefixlen 64 scopeid 0x20
ether 90:e2:ba:f2:1c:19 txqueuelen 1000 (Ethernet)
RX packets 13 bytes 780 (780.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 56 bytes 6032 (6.0 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

l4tbr0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 192.168.55.1 netmask 255.255.255.0 broadcast 192.168.55.255
inet6 fe80::1 prefixlen 128 scopeid 0x20
inet6 fe80::5ce0:5eff:fe90:c37 prefixlen 64 scopeid 0x20
ether 52:ea:3b:35:a5:d6 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 6 bytes 534 (534.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10
loop txqueuelen 1 (Local Loopback)
RX packets 627 bytes 39815 (39.8 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 627 bytes 39815 (39.8 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

rndis0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
ether a2:ea:33:5c:f6:29 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

usb0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
ether 52:ea:3b:35:a5:d6 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

route output:

Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
128.112.0.0 0.0.0.0 255.255.0.0 U 102 0 0 eth0
link-local 0.0.0.0 255.255.0.0 U 1000 0 0 l4tbr0
192.168.30.0 0.0.0.0 255.255.255.0 U 100 0 0 eth1
192.168.40.0 0.0.0.0 255.255.255.0 U 101 0 0 eth2
192.168.55.0 0.0.0.0 255.255.255.0 U 0 0 0 l4tbr0

ethtool eth1 output:

Settings for eth1:
Supported ports: [ FIBRE ]
Supported link modes: 10000baseT/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: No
Advertised link modes: 10000baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: No
Speed: 10000Mb/s
Duplex: Full
Port: Direct Attach Copper
PHYAD: 0
Transceiver: external
Auto-negotiation: off
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes

Just for testing purposes, could you try to go through a switch? I realize this may not be practical, but if it is, then it would help.

About point to point: Some ethernet NICs auto detect and correct for the need for a cross over cable, but if you use cross over, then no such need exists. Are these cross over cables, or is auto detect required? Using regular cables on a switch would rule out cross over having an issue and is a useful debug step.

You have no default route. This is ok though so long as addresses are within the existing subnet and not going to the outside world. Your current settings are probably ok for this, though if some traffic is going outside of the net it could have unexpected side effects. I don’t know what network traffic you have so I can’t say for sure…likely it isn’t a problem, but I am mentioning it in case you are doing something complicated. Similar for bridge entries…I doubt there is any issue, but if some part of the traffic is going through a bridge, then this too might have an unexpected effect.

The interfaces eth0, eth1, and eth2 appear to be valid in the sense that they do not overlap. The bridge entries appear to also be ok and should not be an issue.

On the other hand, you could eliminate the “192.168.55.0” bridge to simplify. This bridge is actually part of the USB gadget mode sample code. If you look at “/opt/nvidia/l4t-usb-device-mode/” you will see how a USB port is used to emulate bulk storage and an ethernet card. This could be disabled without harm and probably should be disabled on most systems. This command will show you the two files which activate this on boot:

ls -l `find /etc/systemd -type l` | grep opt

This shows that these two symbolic links could be deleted (or recreated at a later date if you want automatic activation of the gadget USB demo) to remove the demo:

sudo rm /etc/systemd/system/multi-user.target.wants/nv-l4t-usb-device-mode.service
sudo rm /etc/systemd/system/nv-l4t-usb-device-mode.service

There are of course more “correct” ways to remove this using systemctl, but I just delete the sym links. Then a reboot, and the bridge via the USB device mode will stop.

For the eth1 output of “ethtool”, this device shows as running strictly at 10Gb/s without auto negotiation:

...
Advertised auto-negotiation: No
...
Speed: 10000Mb/s
Duplex: Full
...
Auto-negotiation: off

This is ok if everything is able to use those settings.

I am guessing that you are looking at this:

... txqueuelen 1000 (Ethernet)

Keep in mind that txqueuelen is a buffer size and not a speed. So if you are thinking this implies gigabit, this would be incorrect as it has nothing to do with link speed. On the other hand, adjusting a queue length can cause changes in latency due to rules about when enough data has bulked up for efficient transfer, versus sending on a timeout. See “man ifconfig”, and search for “txqueuelen” if interested in that topic.

This particular example shows very little traffic, which makes it difficult to say if other parts of the system are working correctly. For that tiny amount of data I see no errors, overruns, collisions, drops, collisions, so on. Perhaps a larger amount of time passing traffic would show something, but so far all looks good.

After removing the USB gadget mode, setting for performance as mentioned above, and running a bit more time with more traffic, post the output again for ifconfig of eth1 and eth2. More traffic would be a better indicator. If possible try the same test with a network switch instead of direct connect.

Hi linuxdev:

I’ll have to look around to find a switch but we’ll see what we can do. The connection is dedicated to the SDR we’re talking to & we’ve run the same card & cables on a different machine so I think the auto-detect is working OK. It also seems to run fine at the reduced data rate. The ifconfig output was collected right after boot this morning so it doesn’t show any of the drop statistics. I’ll post a new collection after removing the USB gadget mode. We’re off next week but I’ll get something as soon as we get back.

Thanks again!

  • JLW

Hi linuxdev:

I tried the techniques you mentioned & still no joy. Here is this morning’s ifconfig output after doing a little I/O:

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 128.112.3.3 netmask 255.255.0.0 broadcast 128.112.255.255
inet6 fe80::f97f:4b79:ec64:cea7 prefixlen 64 scopeid 0x20
ether 00:04:4b:cb:9b:a5 txqueuelen 1000 (Ethernet)
RX packets 955 bytes 81191 (81.1 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 835 bytes 79910 (79.9 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 40

eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 192.168.30.1 netmask 255.255.255.0 broadcast 192.168.30.255
inet6 fe80::1e3:fc5f:89f3:c358 prefixlen 64 scopeid 0x20
ether 90:e2:ba:f2:1c:18 txqueuelen 1000 (Ethernet)
RX packets 3189675 bytes 25139723250 (25.1 GB)
RX errors 0 dropped 139847 overruns 0 frame 0
TX packets 63443 bytes 5196484 (5.1 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

eth2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 192.168.40.1 netmask 255.255.255.0 broadcast 192.168.40.255
inet6 fe80::79dd:b1f:cdcb:4036 prefixlen 64 scopeid 0x20
ether 90:e2:ba:f2:1c:19 txqueuelen 1000 (Ethernet)
RX packets 3286221 bytes 25114496876 (25.1 GB)
RX errors 0 dropped 143452 overruns 0 frame 0
TX packets 163941 bytes 11025129 (11.0 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10
loop txqueuelen 1 (Local Loopback)
RX packets 10208 bytes 627832 (627.8 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 10208 bytes 627832 (627.8 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

Thanks!

  • JLW
  • eth0: Operating without error.
  • eth1: RX dropping lots of packets. Could be an ethernet problem or an end consumer problem. TX is ok. Probably not a hardware issue, and with no overruns, frame, or collisions I doubt it is a conflict; definitely it isn't an adverse reaction to another network device interfering.
  • eth2: Same as eth1.

What application is receiving or consuming data from eth1 and eth2? Can you say more about what the software is doing? Sending data over those ports seems ok, but at some point reception overwhelms software, and I’d like to narrow this down.

Also, I don’t know which hardware this is, and which driver it uses. Can you elaborate more on on the hardware and driver?

The software we’re running is a benchmark I/O routine provided by the radio manufacturer to test moving data to/from a software-defined radio. We have also run some software we wrote & it has similar symptoms. The card we are using is an Intel X520-DA2. The ethtool output for the two GbE interfaces shows firmware version 0x61c10001 and driver version ixgbe 4.6.4.

It sounds like benchmarking software is purposely pushing data as fast as it can in order to find what makes its way through. I don’t know how the receive side is built, but is this using UDP and not TCP (TCP has a different software stack)? Have you ever profiled the receive side of the software? At this point we know the data throughput is failing, but we don’t know if it is from the network end (e.g., hardware and driver), versus something consuming the data (the benchmark software…but for example it could be a software design issue, deadlock, so on). With hardware JTAG debugging you could profile the whole chain of events, but I’m not sure how to do this under the circumstances (and I certainly don’t know of a suitable JTAG debugger…if one exists you can be it is very inexpensive).

After a failure what do you see from:

egrep -i "spurious" /proc/interrupts
dmesg | egrep -i "spurious"

Perhaps someone from NVIDIA could recreate this via a PCIe 10G NIC, but unless someone has your specific hardware it may not be possible. What I’ll suggest is that you give every detail you can regarding the actual hardware and driver setup such that someone else with 10G networking can come as close as possible to duplicating your setup.

You are correct about the benchmarking software. I think it is actually trying to stress the radio’s ability to generate consistent data, but breaks the interface in this case. It runs well on other machines we have in our lab, but since the architecture is different I can’t rule out potential problems. The data rate is tunable and it works fine up to the 1 Gb/s point but performance rolls off dramatically after that. Both the benchmark stuff and software that we have written exhibit similar symptoms. Both use UDP/IP protocol. We were trying to keep things a simple as possible since we are moving so much stuff. Our hardware is the Intel X520-DA2 NIC with a point-to-point connection to an Ettus X310 SDR. We are using two NETGEAR ProSAFE 1 Meter Direct Attach SFP+ cables. The driver version is ixgbe 4.6.4 that came with the system. I’ll collect the information this morning & post when I get over to the lab.

I really appreciate all the help!

Thanks again,

  • JLW

P.S.: The two commands produced no output (no mention of “spurious” in interrupts file).

Someone from NVIDIA may have to profile this, but the way I see it the interface is acting in a valid manner for UDP arriving faster than it can be consumed. The trick is that it is not yet possible to differentiate between the cause being the software consuming the data, versus a bottleneck in the system causing the issue prior to end software ever getting a chance to consume the packets.

One suggestion I am going to make is to find the IRQ responsible for your 10G NIC, and move it to another non-Denver core. Normally “ifconfig” would also give “device interrupt”, but I don’t see it for your 10G NIC interfaces. If one is referring to a user space process and wanting to migrate to a specific core (affinity), then the PID would be used. In the case of a device driver, then affinity would be via the IRQ. Do you have any way to find the IRQ of your 10G NIC? If so, then we can test moving it to another core in order to give it a better chance of always getting serviced withiout other drivers competing.

Sorry for late reply. I am out of any 10G NIC at this moment.

We will investigate this issue soon.

Hi linuxdev:

I took a look but couldn’t figure out the IRQs for the 10GbE interfaces. I’m not much of a system/driver programmer (the last driver I wrote was for a DEC PDP-11) so I’m likely missing something. It is an interesting idea though. Is there a way to speed up the servicing of the device interrupts? It seems like that would explain the loss of packets and I would be willing to try some experiments if you know of any techniques.

Thanks!

  • JLW

Sometimes “ifconfig” will show an interrupt.

If you look at “/proc/interrupts”, and can find something in the right-most column, you might be able to identify it.

On R31.1 I see this from “ifconfig eth0”:

device interrupt 40

If I look at IRQ 40 in “/proc/interrupts”, I see this description:

ether_qos.common_irq

On the other hand, I also see these two descriptions:

42: ... 2490000.ether_qos.rx0
 43: ... 2490000.ether_qos.tx0

So I am only guessing, but probably the one listed from “ifconfig” is specific to that hardware, and the tx/rx might be related to data source/sync. The 2490000 would be the address of controller hardware, and if I run this:

sudo find /sys -name '*2490000*'

…I see everything associated with that controller. A couple of those confirm that it all leads back to ether_qos:

/sys/kernel/iommu_groups/4/devices/2490000.ether_qos
/sys/bus/platform/devices/2490000.ether_qos

If I run “sudo find /sys -name ‘eth0’” I get further confirmation:

/sys/devices/2490000.ether_qos/net/eth0
/sys/class/net/eth0

Looks like if you know the device from “ifconfig” (I’ll assume eth1 for an example) you can start via:

sudo -s
cd /sys/class/net
ls -l eth1
# The name of the file pointed to should have some sort of identifier for the driver,
# e.g., the case of my eth0 I see the controller address concatenated with "ether_qos".
# So I look for "ether_qos" in "/proc/interrupts":
egrep ether_qos /proc/interupts
 40:       8696          0          0          0          0          0          0          0     GICv2 226 Level     ether_qos.common_irq
 42:       3737          0          0          0          0          0          0          0     GICv2 222 Level     2490000.ether_qos.rx0
 43:       2114          0          0          0          0          0          0          0     GICv2 218 Level     2490000.ether_qos.tx0

I do not know how the driver divides work, typically there is a separation of work involving physical I/O from work which is purely software (or at least in good drivers), and the one you are interested in (at least for now) is the physical I/O (the “common” one in this case if I am correct…and I am not entirely certain since I don’t know the internals of that driver and the hardware).

If you were to flood ping the interface you’d notice the IRQ count going up fast. I don’t know if the rx/tx would go up as fast or not when nothing is processing other than an ICMP reply. Can you identify which IRQs are associated? Is there a “common” IRQ? Does a flood ping from an outside system cause one of those IRQs to go up faster than the others? If you use netcat (“nc”) to do a massive cat of data (e.g., netcat of “/dev/mmcblk0”) to a remote system via TCP, where the remote system only dumps to “/dev/null”, do the IRQs go up differently than from flood ping? What I’m thinking is that a TCP stack might traverse drivers differently than a non-TCP protocol. You could even look at how the IRQs change over time during any of your above benchmarks.

If you are able to find IRQs, then perhaps we can see what happens when you set affinity to a new core. In the case of PCIe it may not be simple since you have the PCI controller and perhaps DMA also involved (this is where hardware profiling would be valuable…you wouldn’t have to guess at any of it).

Hi jonathan.l.whitaker,

We’re using below two 10Gb PCIe ethernet card to check performance, the result looks good:

  • Startech 10G Pcie card
  • Intel x540 10G Pcie card

Please verify you are using 10Gb network on Xavier and check again.

Thanks!

Just to close out this issue, problem was resolved by updating the OS to Ubuntu v.18.

In case anyone lands here looking for advice on 10G network cards, I was able to get the Asus XG-C100C working on Xavier (https://www.asus.com/us/Networking/XG-C100C/). It uses the “atlantic” driver from Aquantia, which is included in the L4T 31.1 provided kernel. However, the card did not work until I compiled a newer version of the driver:

nvidia@jetson-0423018054806:~$ modinfo atlantic | head -n6
filename:       /lib/modules/4.9.108-tegra/kernel/drivers/net/ethernet/aquantia/atlantic/atlantic.ko
description:    aQuantia Corporation(R) Network Driver
author:         aQuantia
version:        2.0.15.0
license:        GPL v2
srcversion:     AF2F54B673507801B8B253C

With “nvpmodel -m 0” and “~/jetson-clocks.sh” I was able to get 4 Gbps according to iperf.

I also tried the StarTech card ST10000SPEX (https://www.startech.com/Networking-IO/Adapter-Cards/10gb-pcie-nic~ST10000SPEX), but for some reason the link would never come up. This card uses the Tehuti chip (with tn40xx driver), maybe @carolyuu used one with an Intel chip? I wish they’d been more specific.