Cumulus VX BGP sessions unstable in VRF with VLAN subinterfaces

I’m having problems getting BGP sessions to stay up in Cumulus VX on VMware ESXi when using VRFs and VLAN subinterfaces, the BGP session comes up periodically for a few seconds and then dies again - I see the following errors in the FRR log:

2022-03-02T09:36:18.834354+00:00 bgpd[1991]: bind to interface RED failed, errno=1
2022-03-02T09:36:22.799796+00:00 bgpd[1991]: bind to interface BLUE failed, errno=1

Is this a supported configuration?

If I use BGP outside of a VRF context the BGP sessions are stable over VLAN subinterfaces.

Hi Tim,

Can you share the configuration you are using? To test if ESXi is the culprit, you can also spin up a topology in AIR (air.nvidia.com)

Interestingly the problem does not exist in a second Cumulus VX VM I’m running on the same ESXi host, it is also peered with the NSX-T T0 but it does not use VRFs in the Cumulus config as I need EVPN and the documentation states that EVPN address family is not supported in VRFs. That config also uses VLAN subinterfaces and those BGP sessions are stable.

Here is the current config for the VRF VX VM:

- set:
    router:
      bgp:
        autonomous-system: 65200
        enable: on
        router-id: 192.168.0.221
    service:
      ntp:
        mgmt:
          server:
            192.168.0.1: {}
            192.168.0.65: {}
    system:
      hostname: lab-vrouter1
      timezone: Europe/London
    vrf:
      BLUE:
        router:
          bgp:
            address-family:
              ipv4-unicast:
                enable: on
                redistribute:
                  connected:
                    enable: on
            enable: on
            neighbor:
              172.16.90.10:
                address-family:
                  ipv4-unicast:
                    enable: on
                    nexthop-setting: self
                    soft-reconfiguration: on
                remote-as: 65000
                type: numbered
            router-id: 172.16.90.102
        table: auto
      RED:
        router:
          bgp:
            address-family:
              ipv4-unicast:
                enable: on
                redistribute:
                  connected:
                    enable: on
            autonomous-system: 65200
            enable: on
            neighbor:
              172.16.90.6:
                address-family:
                  ipv4-unicast:
                    enable: on
                    nexthop-setting: self
                    soft-reconfiguration: on
                bfd:
                  enable: on
                  min-rx-interval: 500
                  min-tx-interval: 500
                remote-as: 65000
                type: numbered
            peer-group:
              nsx:
                bfd:
                  detect-multiplier: 3
                  enable: on
                  min-rx-interval: 500
                  min-tx-interval: 500
                remote-as: 65000
            router-id: 172.16.90.101
        table: auto
    interface:
      eth0:
        ip:
          address:
            192.168.0.221/24: {}
          gateway:
            192.168.0.1: {}
        type: eth
      swp1,swp1.960-961:
        link:
          mtu: 1700
      swp1:
        type: swp
      swp1.960-961:
        base-interface: swp1
        type: sub
      swp1.960:
        ip:
          address:
            172.16.90.5/30: {}
          vrf: RED
        vlan: 960
      swp1.961:
        ip:
          address:
            172.16.90.9/30: {}
          vrf: BLUE
        vlan: 961

The EVPN SAFI always runs in the default VRF indeed, that is how EVPN works.

This configuration though looks quite straight forward with a BGP session on a subinterface, but I also see:

vlan: 960

On that subinterface. You shouldn’t need that, because it is already has the .1q tag. Can you also show the contents of /etc/network/interfaces?

I’ve just spun up a simple topology in AIR to test this and it seems to work there, but I do see the same bind error in the logs on the instance running the non-default VRF:

2022-03-02T10:13:22.821389+00:00 cumulus bgpd[7242]: bind to interface RED failed,errno=1

The ‘vlan: 960’ config was automatically added when I created the interface using the ‘nv set interface swp1.960…’ command

Here is the contents of the file:

# Auto-generated by NVUE!
# Any local modifications will prevent NVUE from re-generating this file.
# md5sum: 261a9617b91db9bafbb0da3179676505
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*.intf

auto lo
iface lo inet loopback

auto mgmt
iface mgmt
    address 127.0.0.1/8
    address ::1/128
    vrf-table auto

auto BLUE
iface BLUE
    vrf-table auto

auto RED
iface RED
    vrf-table auto

auto eth0
iface eth0
    address 192.168.0.221/24
    gateway 192.168.0.1
    ip-forward off
    ip6-forward off
    vrf mgmt

auto swp1
iface swp1
    mtu 1700

auto swp1.960
iface swp1.960
    address 172.16.90.5/30
    mtu 1700
    vrf RED

auto swp1.961
iface swp1.961
    address 172.16.90.9/30
    mtu 1700
    vrf BLUE

Ah yes, the NVUE config still has new things for me as well. It looks as it should in e/n/i

Could you also share the /etc/frr/frr.conf? Just to double check how that is being generated.

Sure, here you go:

# Auto-generated by NVUE!
# Any local modifications will prevent NVUE from re-generating this file.
# md5sum: 4729dfbd4e48f61abc0b2eccc8ec6734
!---- Cumulus Defaults ----
frr defaults datacenter
log syslog informational
!---- Rendered frr.conf ----
vrf BLUE
exit-vrf
vrf RED
exit-vrf
vrf default
exit-vrf
vrf mgmt
exit-vrf
router bgp 65200 vrf BLUE
bgp router-id 172.16.90.102
timers bgp 3 9
bgp deterministic-med
! Neighbors
neighbor 172.16.90.10 remote-as 65000
neighbor 172.16.90.10 timers 3 9
neighbor 172.16.90.10 timers connect 10
neighbor 172.16.90.10 advertisement-interval 0
no neighbor 172.16.90.10 capability extended-nexthop
! Address families
address-family ipv4 unicast
redistribute connected
maximum-paths ibgp 64
maximum-paths 64
distance bgp 20 200 200
neighbor 172.16.90.10 activate
neighbor 172.16.90.10 next-hop-self
neighbor 172.16.90.10 soft-reconfiguration inbound
exit-address-family
! end of router bgp 65200 vrf BLUE
router bgp 65200 vrf RED
bgp router-id 172.16.90.101
timers bgp 3 9
bgp deterministic-med
! Neighbors
neighbor nsx peer-group
neighbor nsx remote-as 65000
neighbor nsx timers 3 9
neighbor nsx timers connect 10
neighbor nsx advertisement-interval 0
no neighbor nsx capability extended-nexthop
neighbor nsx bfd 3 500 500
neighbor 172.16.90.6 remote-as 65000
neighbor 172.16.90.6 timers 3 9
neighbor 172.16.90.6 timers connect 10
neighbor 172.16.90.6 advertisement-interval 0
no neighbor 172.16.90.6 capability extended-nexthop
neighbor 172.16.90.6 bfd 3 500 500
! Address families
address-family ipv4 unicast
redistribute connected
maximum-paths ibgp 64
maximum-paths 64
distance bgp 20 200 200
neighbor 172.16.90.6 activate
neighbor 172.16.90.6 next-hop-self
neighbor 172.16.90.6 soft-reconfiguration inbound
neighbor nsx activate
exit-address-family
! end of router bgp 65200 vrf RED
!---- CUE snippets ----

Not sure if this is anything to do with the use of non-default VRF or subinterfaces, I’ve run some more tests using the default VRF with subinterfaces and also the default VRF with SVIs and I’m still seeing the same problems. It’s odd as all the ping tests I have done show the network connectivity to be reliable.

I suspect you may be right about this being some issue with ESXi, do you know if the Cumulus VX OVA has been tested on ESXi 7.0?

I am not seeing anything incorrect in the configuration that might cause this. Would you have a chance to create the same topology you have in ESXi with the AIR build tool (Create Your Topology). VX on ESXi isn’t used that much, so it could be something specific there, but you already said you are seeing the same issue in AIR as well. If you give me access to the topology, I can have a closer look.

The VRF topology I created in AIR worked, the BGP session was stable but the error regarding interface binding was still apparent in the logfile - so it seems like the binding error message is not relevant to this problem?

I can’t fully re-create the topology in AIR as I’m using a VMware NSX-T T0 router in my setup too, this issue is specific to one of the peerings to the T0. I’ll try peering two VX VMs directly on ESXi and see if that works, at least it would prove/eliminate the NSX-T T0 as the culprit.

Thanks for reviewing the config, I’m new to Cumulus and its reassuring to know that it doesn’t look like a config issue.

No problem.

Do keep in mind that if you are using VX as a virtual router that this is an unsupported scenario. ;-)