We belive, we have the same problem on Jetson AGX Orin, kernel version 5.10.120.
If vlan interface is created while main nvethernet interface is down, we don’t see incoming tagged packets. But network works, if we create vlan interface after bringing main interface up.
In short, to reproduce problem:
Connect jetson board to external host
On external host set network like this (you may need to change interface name):
sudo ip link add dev eth0.2 link eth0 type vlan id 2
sudo ip link set dev eth0 up
sudo ip link set dev eth0.2 up
sudo ip addr add dev eth0.2 192.168.2.1/24
On jetson board set up network this way, right after boot:
sudo ip link add dev eth1.2 link eth1 type vlan id 2
sudo ip link set dev eth1 up
sudo ip link set dev eth1.2 up
sudo ip addr add dev eth1.2 192.168.2.10/24
Try pinging external host form jetson board, it doesn’t work:
$ ping -c 4 192.168.2.1
PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data.
From 192.168.2.10 icmp_seq=1 Destination Host Unreachable
From 192.168.2.10 icmp_seq=2 Destination Host Unreachable
From 192.168.2.10 icmp_seq=3 Destination Host Unreachable
From 192.168.2.10 icmp_seq=4 Destination Host Unreachable
--- 192.168.2.1 ping statistics ---
4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3064ms
If during network setup on jetson board we instead run sudo ip link set dev eth1 up and then sudo ip link add dev eth1.2 link eth1 type vlan id 2, everything works as expected.
sudo ip link set dev eth1 up
sudo ip link add dev eth1.2 link eth1 type vlan id 2
sudo ip link set dev eth1.2 up
sudo ip addr add dev eth1.2 192.168.2.10/24
It’s mentioned in my post, that this sequence works:
Both sequences are correct, previous one also must work. If network is being set up manually from interactive shell, interface can be brought up first as a workaround.
But in production environment with hundreds of boards deployed, systemd-networkd or some similar automatic network configurator will be used (like e.g. netplan from the original poster), as manual setup is unfeasible and custom shell scripts for network setup are error prone. These configurators bring vlan interface first, which leads to de facto broken network and confusing behaviour.
Here is a more concrete example of failing systemd-networkd configuration:
On external host
sudo ip link add dev eth0.2 link eth0 type vlan id 2
sudo ip link set dev eth0 up
sudo ip link set dev eth0.2 up
sudo ip addr add dev eth0.2 192.168.2.1/24
On jetson board:
sudo tee /etc/systemd/network/eth1.network <<HERE
[Match]
Name=eth1
[Link]
RequiredForOnline=carrier
[Network]
IPv6AcceptRA=false
LinkLocalAddressing=no
VLAN=vlan2
HERE
sudo tee /etc/systemd/network/vlan2.network <<HERE
[Match]
Name=vlan2
[Link]
RequiredForOnline=routable
[Network]
DNS=8.8.8.8
Address=192.168.2.2/24
[Route]
Gateway=192.168.2.1
HERE
sudo tee /etc/systemd/network/vlan2.netdev <<HERE
[NetDev]
Kind=vlan
Name=vlan2
[VLAN]
Id=2
HERE
sudo systemctl daemon-reload
sudo systemctl restart systemd-networkd
External host is not accessible:
$ ping -c 4 192.168.2.1
PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data.
From 192.168.2.10 icmp_seq=1 Destination Host Unreachable
From 192.168.2.10 icmp_seq=2 Destination Host Unreachable
From 192.168.2.10 icmp_seq=3 Destination Host Unreachable
From 192.168.2.10 icmp_seq=4 Destination Host Unreachable
Automatic network configuration tools like systemd-networkd or netplan first create vlan interface (sudo ip link add dev eth1.2 link eth1 type vlan id 2), then bring the main interface up (sudo ip link set dev eth1 up), but this order is not supported by nvethernet. systemd-networkd is widely used in many linux distributions and works with any other linux network driver.
Also, we can create eth1.2 while eth1 is down, it appears in the system, no errors or warnings show up anywhere. Then, we bring eth1 and eth1.2 up, but we don’t see incoming packets on eth1.2, it’s in a broken state, why would driver allow us to create it in the first place?
Therefore we believe this behaviour to be a bug, as it’s non-standard(1) and confusing(2).