EVPN Centralized Routing with Arista EOS

TL&DR: SIP of Networking Was an Understatement 🤦‍♂️

A month ago, I described ARP issues in EVPN centralized routing design, and Naveen Kumar Devaraj was kind enough to add some Arista EOS implementation details. Today, let’s explore what EVPN routes Arista EOS generates in that scenario. We’ll use a very simple lab topology with a spine switch acting as a router. The leaf switches are layer-2 switches.

Packet forwarding in centralized routing design

Packet forwarding in centralized routing design

When the lab is started, all switches advertise IMET routes for the attached VLANs:

EVPN IMET routes advertised by Arista EOS switches
l1#show bgp evpn
BGP routing table information for VRF default
Router identifier 10.0.0.2, local AS number 65000
Route status codes: * - valid, > - active, S - Stale, E - ECMP head, e - ECMP
                    c - Contributing to ECMP, % - Pending best path selection
Origin codes: i - IGP, e - EGP, ? - incomplete
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop

          Network                Next Hop              Metric  LocPref Weight  Path
 * >      RD: 10.0.0.1:1000 imet 10.0.0.1
                                 10.0.0.1              -       100     0       i
 * >      RD: 10.0.0.1:1001 imet 10.0.0.1
                                 10.0.0.1              -       100     0       i
 * >      RD: 10.0.0.2:1000 imet 10.0.0.2
                                 -                     -       -       0       i
 * >      RD: 10.0.0.3:1001 imet 10.0.0.3
                                 10.0.0.3              -       100     0       i

The leaf switches are advertising one IMET route each (they have a single MAC-VRF instance), the spine switch is advertising two IMET routes (it has two MAC-VRF instances within an IP-VRF instance).

Next, let’s ping the default gateway (the spine switch) from one of the hosts:

HB1 pinging the default gateway IP address on Spine
hb1:/# ip route
default via 192.168.121.1 dev eth0
10.0.0.0/24 via 172.16.1.1 dev eth1
10.1.0.0/16 via 172.16.1.1 dev eth1
10.2.0.0/24 via 172.16.1.1 dev eth1
172.16.0.0/16 via 172.16.1.1 dev eth1
172.16.1.0/24 dev eth1 scope link  src 172.16.1.4
192.168.121.0/24 dev eth0 scope link  src 192.168.121.104
hb1:/# ping 172.16.1.1 -c 3
PING 172.16.1.1 (172.16.1.1): 56 data bytes
64 bytes from 172.16.1.1: seq=0 ttl=64 time=14.586 ms
64 bytes from 172.16.1.1: seq=1 ttl=64 time=2.449 ms
64 bytes from 172.16.1.1: seq=2 ttl=64 time=2.351 ms

Before HB1 could send an IP packet to Spine, it had to send an ARP request. Spine could either use ARP gleaning or do an independent ARP resolution when replying to the ICMP Echo request. Regardless of those details, the ARP cache on Spine now contains an entry for HB1:

ARP cache for HB1 in Tenant VRF on Spine
spine#show arp vrf tenant
Legend:
 not learned: Associated MAC address is not present in the MAC address table
 -: Static (configuration or programmed by feature)
Address         Age (sec)  Hardware Addr   Interface
172.16.1.4        0:02:48  aac1.ab85.ab1f  Vlan1001, Vxlan1

The ARP entry belongs to the interface Vlan1001 and is reachable over VXLAN. No surprises there.

What about the MAC-IP routes? The leaf switch is advertising the MAC address of HB1; the spine switch is not advertising anything because the IP address of HB1 was learned over the VXLAN interface:

MAC-IP routes after the initial ARP requests
spine#show bgp evpn route-type mac-ip
BGP routing table information for VRF default
Router identifier 10.0.0.1, local AS number 65000
Route status codes: * - valid, > - active, S - Stale, E - ECMP head, e - ECMP
                    c - Contributing to ECMP, % - Pending best path selection
Origin codes: i - IGP, e - EGP, ? - incomplete
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop

          Network                Next Hop              Metric  LocPref Weight  Path
 * >      RD: 10.0.0.3:1001 mac-ip aac1.ab85.ab1f
                                 10.0.0.3              -       100     0       i

Conclusion:

  • The central router (spine switch) DOES NOT advertise MAC+IP route for hosts attached to its VLANs.
  • In scenarios with multiple routers attached to MAC-VRF instances, each router MUST do its own ARP resolution.

Aside:

  • You could configure ARP snooping on L1/L2 to make them advertise MAC+IP instead of MAC routes for attached hosts. That would give the central router enough information so it wouldn’t have to do ARP resolution on its own.
  • Some EVPN implementations need ARP snooping to work (the central router will NOT send ARP requests over VXLAN).

What about packet flow? As I mentioned in the original blog post, if the central router does not advertise its MAC address, the leaf switches MUST flood all routed traffic1. That’s exactly what’s going on when you use minimal EVPN configuration on Arista EOS (remember: we saw no MAC/IP route for the default gateway’s MAC address):

VLAN and EVPN configuration on the Spine switch running Arista EOS
interface Vlan1000
   description VLAN red (1000) -> [hr1,l1]
   vrf tenant
   ip address 172.16.0.1/24
!
interface Vlan1001
   description VLAN blue (1001) -> [hb1,l2]
   vrf tenant
   ip address 172.16.1.1/24
!
router bgp 65000
   !
   vlan 1000
      rd 10.0.0.1:1000
      route-target import 65000:1000
      route-target export 65000:1000
      redistribute learned
   !
   vlan 1001
      rd 10.0.0.1:1001
      route-target import 65000:1001
      route-target export 65000:1001
      redistribute learned

Notes:

  • The above configuration was generated by netlab release 26.05. The central router does not need the redistribute learned command on the MAC-VRF instances, as it has no directly attached hosts.

If we want to stop the flooding of routed traffic, we MUST configure the central router to advertise at least its MAC address (advertising MAC+IP would be even better). On Arista EOS, use the ‌redistribute router-mac system command2 on the MAC-VRF instances to advertise just the default gateway’s MAC address or ‌redistribute router-mac system ip to advertise the MAC+IP EVPN type-2 route.

When adding that command to both MAC-VRF instances on the Spine switch, the Spine switch starts advertising its VLAN MAC addresses and the default gateway IP addresses, resulting in a well-behaved EVPN fabric.

EVPN MAC-IP routes after the Spine switch starts advertising its VLAN MAC/IP routes
spine#show bgp evpn route-type mac-ip
BGP routing table information for VRF default
Router identifier 10.0.0.1, local AS number 65000
Route status codes: * - valid, > - active, S - Stale, E - ECMP head, e - ECMP
                    c - Contributing to ECMP, % - Pending best path selection
Origin codes: i - IGP, e - EGP, ? - incomplete
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop

          Network                Next Hop              Metric  LocPref Weight  Path
 * >      RD: 10.0.0.1:1000 mac-ip 001c.733a.3232 172.16.0.1
                                 -                     -       -       0       i
 * >      RD: 10.0.0.1:1001 mac-ip 001c.733a.3232 172.16.1.1
                                 -                     -       -       0       i
 * >      RD: 10.0.0.3:1001 mac-ip aac1.ab85.ab1f
                                 10.0.0.3              -       100     0       i

Mission Accomplished ;)

Try It Out

The lab topology I used in this blog post is in the netlab-examples GitHub repository. If you want to try it out:

  • Set up your lab environment (you can use free GitHub Codespaces)
  • Change directory to EVPN/central-routing
  • Execute netlab up and explore

  1. Similar to the “famous” CCIE exam scenario where the MAC and ARP timers are mismatched. ↩︎

  2. Following the honored ancient practice of “throw spaghetti at the wall to see what sticks,” I tried out all available redistribute commands. This is the one that worked for me 🤷‍♂️ ↩︎

1 comments:

  1. Hi Ivan,

    Thanks for continuing to dive into the somewhat murky world of centralized IRB/SVI (I hesitate to use gateway carte blanche these days given all the GW functionality in RFC9014 and IPVPN-EVPN interworking)

    You are correct that the centralized IRB VTEP does not advertise on the MAC+IP in this configuration because, as you pointed out, it is learnt over VXLAN in the data-plane (realized by the timer being present in the arp entry; it's not persistent with a -).

    There are two options if you want to move this into the control plane:

    1) - As you mentioned enable arp snooping on the leaf. Now a MAC+IP route is advertised with the L2 VNI form the leaf. All remote TEPs now learn and install this ARP, including the centralized IRB TEP. Inter-subnet routing is still on the centralized TEP. Note the centralized TEP still does NOT re-advertise this MAC+IP with itself as the NH in this case.

    2) - More recently in the EVPN VXLAN Gateway development (yes GW :)), the central GW can now re-advertise ARP's learnt over the data-plane, setting itself as the next-hop for these type-2's. This negates the need for arp snooping on the leafs. This is useful in environments where arp snooping is not available on the L2 only VTEPs. In this configuration the centralized VTEP with IRB/SVI is also a GW for the L2 TEPs given it's changing the NH for the EVPN T2 MAC+IP routes. This capability is covered in detail in the following TOI: https://www.arista.com/en/support/toi/eos-4-36-0f/23960-centralized-routing-on-dci-gateway-for-l2-only-vteps

    As mentioned in other comments, this centralized deployment is the lesser chosen option, for all the reasons illustrated in these posts - i.e. there's a lot going on behind the scenes, and state is centralized, leading to larger blast radii. That being said, over time, adding all of these enhancements has made the deployment option more flexible and robust..

    Thanks Russell

Add comment
Sidebar