X86 CPU to Xavier AGX (in endpoint mode) with PCIe : How enable Ethernet over PCIe driver

Hi,

I have successfully connected our two Xavier AGX dev kit with a PCIe x16 cable and test the “Ethernet over PCIe drivers” by following the steps provided in https://docs.nvidia.com/jetson/l4t/index.html#page/Tegra%20Linux%20Driver%20Package%20Development%20Guide/xavier_PCIe_endpoint_mode.html

I use the JetPack 4.3 release and applied the patch to get the 5Gbs bandwidth.

Now, I want to replace one Xavier by a x86 CPU. The x86 CPU is the RootComplex and the Xavier is already flashed in EndPoint mode.

How Can I make the “Ethernet over PCIe drivers” work on my x86 CPU as it is the case currently when I connect two Xaviers?

Without any other configuration, when I boot my x86 CPU, “lspci” does not list the Xavier.

Please I need help.
Thanks.

Please keep AGX (EP) booted first along with the commands run to keep it ready.
Also compile its device driver ( tegra_vnet.c ) on x86. With this, you should be able to replicate it on x86 as well.

Thanks for your response
I compiled the “tegra_vnet.c” driver and get it worked on my x86. Xavier AGX is correctly identified by the x86. “lspci” output is below:

b3:00.0 Network controller: NVIDIA Corporation Device 2296
Subsystem: NVIDIA Corporation Device 0000
Flags: bus master, fast devsel, latency 0, IRQ 96, NUMA node 0
Memory at fb800000 (32-bit, non-prefetchable) [size=4M]
Memory at fffff00000 (64-bit, prefetchable) [size=128K]
Memory at fbc00000 (64-bit, non-prefetchable) [size=1M]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/16 Maskable+ 64bit-
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [b0] MSI-X: Enable+ Count=8 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [148] #19
Capabilities: [168] #26
Capabilities: [190] #27
Capabilities: [1b8] Latency Tolerance Reporting
Capabilities: [1c0] L1 PM Substates
Capabilities: [1d0] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
Capabilities: [2d0] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
Capabilities: [308] #25
Capabilities: [314] Precision Time Measurement
Capabilities: [320] Vendor Specific Information: ID=0003 Rev=1 Len=054 <?>
Kernel driver in use: tvnet

However, after all configuration, I run iperf3 on both and I got a “Unable to connect. No route to host” error message from the client (Xavier AGX) side. I cant’ ping the x86 (192.168.2.2) from AGX (192.168.2.1). I tries the bind option of iperf, but the connectivity issue remains.

I notice that on x86, Ethernet interface logical name is “eth0” instead of “eth1” (as I got on Xavier"

Thanks

On each end of the connection (on each host), do you now see something related to those addresses via “ifconfig” and “route” commands? I’ve never used the PCIe endpoint like this, but if ping or other network commands are used, then it makes sense that each end would also have to do normal network setup.

Yes I see those addresses via “ifconfig” on each host. Below the output of “ifconfig” and “route” on both hosts.

On RootComplex (x86):

  • ifconfig

eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.0.14 netmask 255.255.255.0 broadcast 192.168.0.255
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 64512
inet 192.168.0.31 netmask 255.255.255.0 broadcast 192.168.0.255

  • route

Table de routage IP du noyau
Destination Passerelle Genmask Indic Metric Ref Use Iface
0.0.0.0 192.168.0.1 0.0.0.0 UG 100 0 0 eno1
169.254.0.0 0.0.0.0 255.255.0.0 U 1000 0 0 eno1
192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
192.168.0.0 0.0.0.0 255.255.255.0 U 100 0 0 eno1

On EndPoint (AGX):

  • ifconfig

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.0.15 netmask 255.255.255.0 broadcast 192.168.0.255
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 64512
inet 192.168.0.30 netmask 255.255.255.0 broadcast 192.168.0.255

  • route

Table de routage IP du noyau
Destination Passerelle Genmask Indic Metric Ref Use Iface
0.0.0.0 192.168.0.1 0.0.0.0 UG 100 0 0 eth0
169.254.0.0 0.0.0.0 255.255.0.0 U 1000 0 0 eth0
192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
192.168.0.0 0.0.0.0 255.255.255.0 U 100 0 0 eth0

On each host, I can ping successfully its locals interfaces (for example: 192.168.0.14 and 192.168.0.31 for the x86). I also have no problem accessing the internet on each host. However, I can’t ping the AGX from the x_86 and vice versa. The firewall is disabled on the x_86.

On the x86 host you have a routing table which will fail:

Destination  Passerelle  Genmask       Indic Metric Ref Use Iface
0.0.0.0      192.168.0.1 0.0.0.0       UG    100    0   0   eno1
169.254.0.0  0.0.0.0     255.255.0.0   U     1000   0   0   eno1
192.168.0.0  0.0.0.0     255.255.255.0 U     0      0   0   eth0
192.168.0.0  0.0.0.0     255.255.255.0 U     100    0   0   eno1

Notice that both eth0 and eno1 cover the entire subnet. The only difference is that eno1 will never get used unless eth0 fails. The “metric” describes priorities, and a higher metric implies a higher cost. eno0 will never see a packet (other than setting up DHCP and/or route) since it has a higher metric.

What I will suggest is that on the host you use something like “nm-connection-editor” (if you don’t have this, then “sudo apt-get install network-manager-gnome”) and set the PCIe link (eno1) to static IPv4 address 192.168.0.14, and netmask 255.255.255.254. Then manually set the metric to 0. You might find this reference URL helpful, especially information about metric:
https://docs.ubuntu.com/core/en/stacks/network/network-manager/docs/routing-tables

Another app which might help is “ifmetric” ("sudo apt-get install ifmetric"), although I tend to edit files (NetworkManager can make file editing seem to fail at times, so first do everything you can in “nm-connection-editor” before using other tools). The “route” command can use the “metric” argument to set a metric (see “man route”).

Example with the command line tool (part of network-manager):

nmcli connection modify eno1 ipv4.route-metric 0

(you might need to ifdown and ifup the interface, not sure)

Then run “route” again to see if the metric is 0. Having the eno1 cover only a single address, and making its metric equal to eth0, implies the more specific address will get the traffic (and it is more specific if the netmask is a “/31”, or “255.255.255.254”).

There is a similar issue with the Jetson. You have two networks with conflicting networks over the same address range. Only the address with the metric of “0” will get traffic. Setting the PCIe network to have a single address in the route (meaning netmask is “255.255.254” instead of “255.255.255.0”), and setting metric to 0, should make that route available for that address.

Normally what someone would do in a situation like this is to simply put the second NIC (which your PCIe is emulating) on a different subnet…then there would be no conflict in anyway. For example, if the PCIe had address “192.168.1.x” (note “.1.” instead of “.0.”…a different “/24” subnet), then metric and route changes would never matter…this would be the only device on that subnet and there would be no confusion as to which interface to route to.

Pretty much any online web search you perform for Ubuntu networking, or docs on subnets, would be the same for this. It won’t matter if the NIC is really PCIe or real or virtual USB…at this point it is about network setup, and not about the PCIe side.

There is known issue for PCIe virtual network driver to work with x86 host. Please wait for further update on fix.

Thanks,
Om

Thanks for your kind notification.
May I know the plan of the fix? In Jetpack 4.4 GA or Jetpack 4.5/5

tvnet.zip (9.7 KB)

On x86 Host side please use attached EP driver.

Please note PCIe virtual Ethernet interface will have mtu size 64512. Interface name will be ethx

Follow below steps:

  1. Boot AGX(EP) first and start function driver:
    cd /sys/kernel/config/pci_ep/
    mkdir functions/pci_epf_tvnet/func1
    ln -s functions/pci_epf_tvnet/func1 controllers/141a0000.pcie_ep/
    echo 1 > controllers/141a0000.pcie_ep/start

  2. Boot X86 host. verify PCIe Virtual Ethernet EP has been detected.
    lspci

    xx Network controller: NVIDIA Corporation Device 2296

  3. Compile and load attached driver on x86 Host

  4. Set IP address on both EP and Host

1 Like

Thanks

I tested the driver and it works.