Unable to set up eBGP peers

Hi community,

My BGP configuration is very simple, but I cannot set up eBGP peers, they remain in Active state. The BGP configuration is like that:

nv set vrf default router bgp autonomous-system 64514
nv set vrf default router bgp router-id 10.10.20.1
nv set vrf default router bgp neighbor 10.10.20.2 remote-as 64515
nv set vrf default router bgp neighbor 10.10.30.1 remote-as 64512
nv set vrf default router bgp neighbor 10.10.30.1 multihop-ttl 100
nv set vrf default router bgp neighbor 10.10.30.2 remote-as 64513
nv set vrf default router bgp neighbor 10.10.30.2 multihop-ttl 100

The configuration in the peers is the opposite.

cumulus@SW-MLNX-01-AVZ1:mgmt:~$net show bgp summary
show bgp ipv4 unicast summary

BGP router identifier 10.10.20.1, local AS number 64514 vrf-id 0
BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 3, using 68 KiB of memory

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt
10.10.20.2 4 64515 0 0 0 0 0 never Active 0
10.10.30.1 4 64512 0 0 0 0 0 never Active 0
10.10.30.2 4 64513 0 0 0 0 0 never Active 0

Total number of neighbors 3

But I can ping the peers successfully:

cumulus@SW-MLNX-01-AVZ1:mgmt:~$ ping 10.10.20e[Ke[K30.1
vrf-wrapper.sh: switching to vrf “default”; use ‘–no-vrf-switch’ to disable
PING 10.10.30.1 (10.10.30.1) 56(84) bytes of data.
64 bytes from 10.10.30.1: icmp_seq=1 ttl=61 time=7.78 ms
64 bytes from 10.10.30.1: icmp_seq=2 ttl=61 time=5.03 ms
64 bytes from 10.10.30.1: icmp_seq=3 ttl=61 time=5.64 ms
64 bytes from 10.10.30.1: icmp_seq=4 ttl=61 time=4.38 ms
64 bytes from 10.10.30.1: icmp_seq=5 ttl=61 time=7.10 ms
^C
— 10.10.30.1 ping statistics —
5 packets transmitted, 5 received, 0% packet loss, time 11ms
rtt min/avg/max/mdev = 4.380/5.985/7.783/1.276 ms

I cannot see any interesting in frr.log, only this:

2023-02-06T11:48:01.162971+00:00 SW-MLNX-01-AVZ1 watchfrr[1913]: zebra state → up : connect succeeded
2023-02-06T11:48:01.165817+00:00 SW-MLNX-01-AVZ1 watchfrr[1913]: bgpd state → up : connect succeeded
2023-02-06T11:48:01.167261+00:00 SW-MLNX-01-AVZ1 watchfrr[1913]: all daemons up, doing startup-complete notify
2023-02-06T11:48:01.931755+00:00 SW-MLNX-01-AVZ1 zebra[1955]: Configuration Read in Took: 00:00:00
2023-02-06T11:48:02.789740+00:00 SW-MLNX-01-AVZ1 bgpd[1998]: Configuration Read in Took: 00:00:01
2023-02-06T11:48:02.839266+00:00 SW-MLNX-01-AVZ1 watchfrr[1913]: Daemon: zebra: is in Up state but expected it to be in DAEMON_DOWN state
2023-02-06T11:48:02.842824+00:00 SW-MLNX-01-AVZ1 watchfrr[1913]: Daemon: bgpd: is in Up state but expected it to be in DAEMON_DOWN state
2023-02-06T11:48:02.843310+00:00 SW-MLNX-01-AVZ1 watchfrr[1913]: Daemon: staticd: is in Up state but expected it to be in DAEMON_DOWN state

I don’t know if those logs are fine or have to do with the BGP session establishment.
Any idea?

Regards,
Julián

Do you have a diagram and a complete config that you can share?

Sure, here you are the diagram and the configuration files. I am using IPs 172.x.x.x for BGP router-ids and establishing the BGP sessions. The upper routers only have static routes and the gateways for reachability between the BGP router-ids, and the simulate a MPLS network, they do not have BGP configuration. After doing a little change, the BGP sessions are only established between the directly connected switches, but not between the switches of the different sites, although they can ping each other.

cumulus@SW-MLNX-01-AVZ1:mgmt:~$ net show bgp summary
show bgp ipv4 unicast summary

BGP router identifier 172.16.0.1, local AS number 64514 vrf-id 0
BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 3, using 68 KiB of memory

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt
SW-MLNX-02-AVZ1(172.16.0.2) 4 64515 123 123 0 0 0 00:05:20 0 0
172.17.0.1 4 64512 0 0 0 0 0 never Active 0
172.17.0.2 4 64513 0 0 0 0 0 never Active 0

Total number of neighbors 3

If a do a tcpdump on a switch of site AVZ1, I see there is BGP messages sent to the directly connected switch of site AVZ1, but no BGP messages sent to switches of site AVZ2. But when I add a static route to 172.17.x.x (site AVZ2) on a switch of site AVZ1, the switch starts to sent BGP messages to switches of site AVZ2 and they establish the BGP session, even they already have a default route with the same gateway.

Adding “nv set vrf default router static 172.17.0.0/24 via 172.16.0.100” to a switch of site AVZ1:

cumulus@SW-MLNX-01-AVZ1:mgmt:~$ net show bgp summary
show bgp ipv4 unicast summary

BGP router identifier 172.16.0.1, local AS number 64514 vrf-id 0
BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 3, using 68 KiB of memory

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt
SW-MLNX-02-AVZ1(172.16.0.2) 4 64515 284 284 0 0 0 00:13:24 0 0
SW-MLNX-01-AVZ2(172.17.0.1) 4 64512 9 9 0 0 0 00:00:20 0 0
SW-MLNX-02-AVZ2(172.17.0.2) 4 64513 9 9 0 0 0 00:00:20 0 0

Total number of neighbors 3

What am I missing?

Regards,
Julián

config.rar (1.9 KB)

The config that you provided are the nv set commands. It is hard to troubleshoot from that. However, given the diagram, I assume that this is a running simulation in AIR.

Perhaps you can add me to the simulation (attilla (at) nvidia (dot) com), so I can look at the simulation directly.

Hi attilla,

Sorry but I am new on Cumulus/Linux. Do you mean the configuration you have when you run the command “nv config show”? I was simulating in AIR, but I switched to GNS3 VM because I thought the BGP issue was an AIR limitation. I will check but I think my AIR simulation expired. Thanks for your interest.

Regards,
Julián