A student recently asked me a series of interesting questions about BGP route reflectors and how they use Cluster-ID. Since this is one of those topics that my students almost always seem to struggle with, let’s hope this article can clear some of the doubt.
When BGP was originally designed, there was no provision for the loop prevention inside a single autonomous system. Instead, the internal BGP rules prohibited advertising a route learned from one internal (iBGP) to another internal peer. This is the primary reason why the traditional BGP wisdom said we “must” have full mesh of iBGP peers. With the full mesh, a problem of scalability of such a setup comes into play, since the number of peering sessions will be N(N-1)/2, where N is the number of our internal BGP routers.
BGP route reflectors relax the prohibition on advertising internally learned routes to other iBGP peers, but dividing peers into two separate classes of peers: route-reflector clients and non-clients, which are regular iBGP peers with the restrictions intact. Routes learned from clients will be advertised to other clients and to non-clients. However, routes learned from non-clients will be advertised only to the clients. This situation opens up interesting loop and failure scenarios that may need special handling.
Test Network
I don’t like empty talk. It either works or it doesn’t count! To test everything I write, I need a test network. For this scenario, I will use the network shown on the diagram below.
All routers are in BGP AS 65000 and there is EIGRP configured between them. Each router has Loopback0 interface with the address 192.168.0.X, where X represents the router number. In addition to this Loopback interface, there is another, Loopback1 with the address 10.X. This additional Loopback interface is not advertised in EIGRP, but will be redistributed into BGP. Routers R4 and R5 are route-reflectors for routers R2 and R6. Initially, R2 will peer with R4, R4 will peer with R5, which will in turn peer with R2. The ultimate goal is to build a redundant peering between route-reflectors and clients without having a full mesh. All routers peer between Loopback0 interfaces.
These are the relevant interface and routing protocol configurations.
R2:
interface Loopback0 ip address 192.168.0.2 255.255.255.255 ! interface Loopback1 ip address 10.2.0.2 255.255.255.0 ! interface GigabitEthernet0/0 ip address 192.168.100.2 255.255.255.0 ! router eigrp 2456 network 192.168.0.0 0.0.255.255 no auto-summary ! router bgp 65000 redistribute connected route-map CONNECTED-to-BGP neighbor 192.168.0.4 remote-as 65000 neighbor 192.168.0.4 update-source Loopback0 neighbor 192.168.0.4 send-community ! route-map CONNECTED-to-BGP permit 10 match interface Loopback1 set origin igp !
R4:
interface Loopback0 ip address 192.168.0.4 255.255.255.255 ! interface Loopback1 ip address 10.4.0.4 255.255.255.0 ! interface FastEthernet0/0 ip address 192.168.100.4 255.255.255.0 ! interface FastEthernet0/1 ip address 192.168.45.4 255.255.255.0 ! router eigrp 2456 network 192.168.0.0 0.0.255.255 no auto-summary ! router bgp 65000 redistribute connected route-map CONNECTED-to-BGP neighbor 192.168.0.2 remote-as 65000 neighbor 192.168.0.2 update-source Loopback0 neighbor 192.168.0.2 route-reflector-client neighbor 192.168.0.2 send-community neighbor 192.168.0.5 remote-as 65000 neighbor 192.168.0.5 update-source Loopback0 neighbor 192.168.0.5 send-community ! route-map CONNECTED-to-BGP permit 10 match interface Loopback1 set origin igp !
R5:
interface Loopback0 ip address 192.168.0.5 255.255.255.255 ! interface Loopback1 ip address 10.5.0.5 255.255.255.0 ! interface FastEthernet0/0 ip address 192.168.100.5 255.255.255.0 ! interface FastEthernet0/1 ip address 192.168.45.5 255.255.255.0 ! router eigrp 2456 network 192.168.0.0 0.0.255.255 no auto-summary ! router bgp 65000 redistribute connected route-map CONNECTED-to-BGP neighbor 192.168.0.4 remote-as 65000 neighbor 192.168.0.4 update-source Loopback0 neighbor 192.168.0.4 send-community neighbor 192.168.0.6 remote-as 65000 neighbor 192.168.0.6 update-source Loopback0 neighbor 192.168.0.6 route-reflector-client neighbor 192.168.0.6 send-community !
R6:
interface Loopback0 ip address 192.168.0.6 255.255.255.255 ! interface Loopback1 ip address 10.6.0.6 255.255.255.0 ! interface FastEthernet0/0 ip address 192.168.100.6 255.255.255.0 ! router bgp 65000 redistribute connected route-map CONNECTED-to-BGP neighbor 192.168.0.5 remote-as 65000 neighbor 192.168.0.5 update-source Loopback0 neighbor 192.168.0.5 send-community ! route-map CONNECTED-to-BGP permit 10 match interface Loopback1 set origin igp !
Please note that “send-community” and “set origin igp” configurations are not required. They were configured only because I liked the idea of doing it and no other reason.
Finally, “show ip bgp summary” on R4 and R5 will show us that our configuration above worked.
R4:
R4#show ip bgp summary | begin Neighbor Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 192.168.0.2 4 65000 258 260 5 0 0 04:17:37 1 192.168.0.5 4 65000 259 259 5 0 0 04:17:19 2
R5:
R5#show ip bgp summary | begin Neighbor Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 192.168.0.4 4 65000 259 259 5 0 0 04:17:19 2 192.168.0.6 4 65000 258 260 5 0 0 04:17:25 1
Loop Prevention With Originator-ID and Cluster-ID
Loop prevention for both Cluster-ID and Originator-ID works in a very similar and simple manner:
- If a BGP router that receives a route from an iBGP neighbor in the incoming update detects the presence of its own Router-ID in the Originator-ID attribute it will reject the update.
- If a BGP router that receives a route from an iBGP neighbor is configured to operate as a route reflector and in the incoming update detects the presence of its own Cluster-ID in the Cluster-list attribute it will reject the update.
To illustrate this functionality and the possible loop problem, I will add a new iBGP sessions between R2 and R5, and R4 and R6 but keep them inactive to begin with.
R2:
router bgp 65000 neighbor 192.168.0.5 remote-as 65000 neighbor 192.168.0.5 update-source Loopback0 neighbor 192.168.0.5 send-community neighbor 192.168.0.5 shutdown !
R4:
router bgp 65000 neighbor 192.168.0.6 remote-as 65000 neighbor 192.168.0.6 update-source Loopback0 neighbor 192.168.0.6 route-reflector-client neighbor 192.168.0.6 send-community neighbor 192.168.0.6 shutdown !
R5:
router bgp 65000 neighbor 192.168.0.2 remote-as 65000 neighbor 192.168.0.2 update-source Loopback0 neighbor 192.168.0.2 route-reflector-client neighbor 192.168.0.2 send-community neighbor 192.168.0.2 shutdown !
R6:
router bgp 65000 neighbor 192.168.0.4 remote-as 65000 neighbor 192.168.0.4 update-source Loopback0 neighbor 192.168.0.4 send-community neighbor 192.168.0.4 shutdown !
After I have activated my new peerings, this would be the peering arrangement.
Before I activate these session, let’s examine 10.6.0.0/24 route advertised by R6 to R5 and subsequently to R4 and R2.
R5:
R5#show ip bgp 10.6.0.0/24 BGP routing table entry for 10.6.0.0/24, version 3 Paths: (1 available, best #1, table Default-IP-Routing-Table) Advertised to update-groups: 1 2 Local, (Received from a RR-client) 192.168.0.6 (metric 156160) from 192.168.0.6 (192.168.0.6) Origin IGP, metric 0, localpref 100, valid, internal, best
We can see that on R5, this route looks like a “normal” iBGP-learned route. There is nothing special or unusual about it.
R4:
R4#show ip bgp 10.6.0.0/24 BGP routing table entry for 10.6.0.0/24, version 19 Paths: (1 available, best #1, table Default-IP-Routing-Table) Advertised to update-groups: 2 Local 192.168.0.6 (metric 156160) from 192.168.0.5 (192.168.0.5) Origin IGP, metric 0, localpref 100, valid, internal, best Originator: 192.168.0.6, Cluster list: 192.168.0.5
In the above output on R4 we do have an additional line that mentions Originator and Cluster list. These two attributes are essential for loop prevention in the network that’s using route-reflectors.
We can clearly see that both R5 and R4 receive a single copy of this route. R5′s route is received from R6 and R4′s is received from R5 with the preserved attributes such as the next-hop.
Originator is the Router-ID of the iBGP peer that advertised the route to a route-reflector. In our case, this is R6. Cluster list is a set of BGP Cluster-IDs of all the route reflectors that reflected this route. By default, Cluster-ID will be the same as the Router-ID of the route reflector. Since R5′s route is not a reflected one, we’ll see only R5′s Cluster-ID on R4, but not on R5. However, if we take a look at R2, we’ll see that the Cluster-list entry is a little bit longer. We can also se that the Originator-ID has been preserved.
R2:
R2#show ip bgp 10.6.0.0/24 BGP routing table entry for 10.6.0.0/24, version 22 Paths: (1 available, best #1, table Default-IP-Routing-Table) Not advertised to any peer Local 192.168.0.6 (metric 156160) from 192.168.0.4 (192.168.0.4) Origin IGP, metric 0, localpref 100, valid, internal, best Originator: 192.168.0.6, Cluster list: 192.168.0.4, 192.168.0.5
Let’s now activate the session between R2 and R5.
R2:
router bgp 65000 no neighbor 192.168.0.5 shutdown !
R4:
router bgp 65000 no neighbor 192.168.0.6 shutdown !
R5:
router bgp 65000 no neighbor 192.168.0.2 shutdown !
R6:
router bgp 65000 no neighbor 192.168.0.4 shutdown !
Let’s now observe the same route as before on R2 and see if there were any differences.
R2:
R2#show ip bgp 10.6.0.0/24 BGP routing table entry for 10.6.0.0/24, version 29 Paths: (2 available, best #2, table Default-IP-Routing-Table) Not advertised to any peer Local 192.168.0.6 (metric 156160) from 192.168.0.5 (192.168.0.5) Origin IGP, metric 0, localpref 100, valid, internal Originator: 192.168.0.6, Cluster list: 192.168.0.5 Local 192.168.0.6 (metric 156160) from 192.168.0.4 (192.168.0.4) Origin IGP, metric 0, localpref 100, valid, internal, best Originator: 192.168.0.6, Cluster list: 192.168.0.4
R2 now has two copies of the route. One route is received from R4 and the other one from R5. The next-hop for this route is Loopback0 for which I have one EIGRP route on our shared interface. I don’t need two routes, yet I received them. It is also interesting to see what’s happening on R4 and R5.
R4:
R4#show ip bgp 10.6.0.0/24 BGP routing table entry for 10.6.0.0/24, version 20 Paths: (2 available, best #1, table Default-IP-Routing-Table) Advertised to update-groups: 1 2 Local, (Received from a RR-client) 192.168.0.6 (metric 156160) from 192.168.0.6 (192.168.0.6) Origin IGP, metric 0, localpref 100, valid, internal, best Local 192.168.0.6 (metric 156160) from 192.168.0.5 (192.168.0.5) Origin IGP, metric 0, localpref 100, valid, internal Originator: 192.168.0.6, Cluster list: 192.168.0.5
We can see that R4 also has two copies of this same route. One is received from R6 directly and the other one from R5. The route R4 chose as the best will be the one received directly from the neighbor, which in turn means it will be advertised to R5. Let’s confirm that.
R4:
R4#show ip bgp update-group 1 BGP version 4 update-group 1, internal, Address Family: IPv4 Unicast BGP Update version : 20/0, messages 0 4 octets ASN capable Community attribute sent to this neighbor Update messages formatted 22, replicated 0 Number of NLRIs in the update sent: max 1, min 0 Minimum time between advertisement runs is 0 seconds Has 1 member (* indicates the members currently being sent updates): 192.168.0.5
R5:
R5#show ip bgp 10.6.0.0/24 BGP routing table entry for 10.6.0.0/24, version 21 Paths: (2 available, best #2, table Default-IP-Routing-Table) Advertised to update-groups: 1 2 Local 192.168.0.6 (metric 156160) from 192.168.0.4 (192.168.0.4) Origin IGP, metric 0, localpref 100, valid, internal Originator: 192.168.0.6, Cluster list: 192.168.0.4 Local, (Received from a RR-client) 192.168.0.6 (metric 156160) from 192.168.0.6 (192.168.0.6) Origin IGP, metric 0, localpref 100, valid, internal, best
Now, it stands to reason that after receiving this route, R5 would advertise it to its clients, which happen to include R6, the origin of our route.
R6:
R6#show ip bgp 10.6.0.0/24 BGP routing table entry for 10.6.0.0/24, version 5 Paths: (1 available, best #1, table Default-IP-Routing-Table) Advertised to update-groups: 1 Local 0.0.0.0 from 0.0.0.0 (192.168.0.6) Origin IGP, metric 0, localpref 100, weight 32768, valid, sourced, best
R6 doesn’t seem to have it in its BGP table, but that doesn’t mean R5 is not advertising it. I will clear the BGP session between R5 and R6 and enable “debug ip bgp ipv4 unicast updates in” on R6.
R6:
R6#debug ip bgp ipv4 unicast updates in BGP updates debugging is on (inbound) for address family: IPv4 Unicast R6#clear ip bgp 192.168.0.5 BGP: 192.168.0.5 Local router is the Originator; Discard update BGP(0): 192.168.0.5 rcv UPDATE w/ attr: nexthop 192.168.0.5, origin i, localpref 100, metric 0, originator 192.168.0.6, clusterlist 192.168.0.5 192.168.0.4, merged path , AS_PATH , community , extended community , SSA attribute BGP(0): 192.168.0.5 rcv UPDATE about 10.6.0.0/24 -- DENIED due to: ORIGINATOR is us;
Sure enough, we can see that R5 tried to advertise the route to R6, but this update was rejected. The reason for rejection was the Originator-ID!
This explains the problem and the solution for this kind of loop, but this is not the end. At this point, R4 and R5, which are redundant route-reflectors both have copies of client routes received from clients and from each other. By itself, this is not a problem, but imagine that out clients had full Internet BGP route feeds. At the time of writing, full Internet routing table stands at around 350000 (three hundred fifty thousand, yes) routes! Advertising the routes between route-reflectors in this scenario will double the memory requirements for the BGP tables and increase the overall convergence time. This is clearly not desired.
To prevent the problem I just described, we can make our route-reflectors be members of the same redundancy cluster. To do that, we need to ensure they share the same Cluster-ID. Remember, by default, this value is derived from the Router-ID, but they are otherwise not related. I will change our Cluster-IDs to have the same value as the subnet used between our 4 routers and reset all BGP peerings.
R4 & R5:
router bgp 65000 bgp cluster-id 192.168.100.0 !
Let’s check the status of our 10.6.0.0/24 route. We’ll start with R2.
R2:
R2#show ip bgp 10.6.0.0/24 BGP routing table entry for 10.6.0.0/24, version 4 Paths: (2 available, best #2, table Default-IP-Routing-Table) Not advertised to any peer Local 192.168.0.6 (metric 156160) from 192.168.0.5 (192.168.0.5) Origin IGP, metric 0, localpref 100, valid, internal Originator: 192.168.0.6, Cluster list: 192.168.100.0 Local 192.168.0.6 (metric 156160) from 192.168.0.4 (192.168.0.4) Origin IGP, metric 0, localpref 100, valid, internal, best Originator: 192.168.0.6, Cluster list: 192.168.100.0
Just as before, we can see that we learned two routes. We should take note that Cluster-ID is now the same on both route-reflectors, which can be observed in the Cluster-list above. The situation on R4 is rather different than before though.
R4:
R4#show ip bgp 10.6.0.0/24 BGP routing table entry for 10.6.0.0/24, version 4 Paths: (1 available, best #1, table Default-IP-Routing-Table) Advertised to update-groups: 1 3 Local, (Received from a RR-client) 192.168.0.6 (metric 156160) from 192.168.0.6 (192.168.0.6) Origin IGP, metric 0, localpref 100, valid, internal, best
We can see here that R4 has only one route – the one learned from R6 directly. Let’s turn on the same debug as we did earlier on R6 and see why we don’t have the route from R5 any more.
R4:
R4#debug ip bgp ipv4 unicast updates in BGP updates debugging is on (inbound) for address family: IPv4 Unicast R4#clear ip bgp 192.168.0.5 BGP: 192.168.0.5 RR in same cluster. Reflected update dropped BGP(0): 192.168.0.5 rcv UPDATE w/ attr: nexthop 192.168.0.6, origin i, localpref 100, metric 0, originator 192.168.0.6, clusterlist 192.168.100.0, merged path , AS_PATH , community , extended community , SSA attribute BGP(0): 192.168.0.5 rcv UPDATE about 10.6.0.0/24 -- DENIED due to: reflected from the same cluster;
In the debug we can very clearly observe how Cluster-ID prevented us from wasting resources. This is a primary reason why best-practices design for redundant route-reflectors calls for them to be “in the same cluster”. Is the best practice actually a good practice?
When the Best Practice Isn’t a Good Practice
Let’s for a second assume that between our clients and routers is a complex L2 network that may be susceptible to failures of all sorts (i.e. there is network between them). Let’s further assume that this underlying network experienced such a failure that prevented R2 from communicating with R5, as well as it prevented R6 to communicate with R4. In other words, we’re back to our original peering setup: R2 and R4 are peering, R4 and R5 are peering, and R5 and R6 are peering. I will simulate this “failure” by simply shutting those peerings down.
R2:
router bgp 65000 neighbor 192.168.0.5 shutdown !
R6:
router bgp 65000 neighbor 192.168.0.4 shutdown !
Let’s take a look at our 10.6.0.0/24 route on R2 now.
R2:
R2#show ip bgp 10.6.0.0/24 % Network not in table
We don’t have it! R2 is peering with R4, does that router have it?
R4:
R4#show ip bgp 10.6.0.0/24 % Network not in table
It does not. You will remember that there is no BGP session between R4 and R6. R4 won’t accept any routes from R5 that have Cluster-list populated with their shared Cluster-ID, effectively preventing R2 from learning R6′s routes. This is a problem and this problem can be solved by breaking up the cluster, essentially taking us all the way back to the beginning of this article.
Conclusions
Understanding Originator-ID and Cluster-ID is very important for understanding how iBGP works and scales. The scenario I provided in this article is a bit artificial, but only a little bit. It presents network engineers with a choice that’s not easy. Whenever designing and implementing redundant BGP route-reflectors, we may need to choose between redundancy and memory resources. Choices we make may or may not come back to haunt us.
Happy studies!