TOF1314: IP Multicast and PIM Rendezvous Points

Introduction

For the friends tracking how we're doing, a small group of ex-Mentor folks have formed Chesapeake NetCraftsmen ( http://www.netcraftsmen.net ). We're doing high-end consulting and selected training. See the blurb paragraph for links. In these articles, we've been talking about IP Multicast. This month we'll take a look at options concerning Rendezvous Points, which are needed for IP Multicast using PIM Spare Mode. (As we saw, this is the approach to IP multicast that scales best.)
Previous articles that might be of interest:

Why Rendezvous Points?

A Rendezvous Point (RP) is used as a temporary way to connect a would-be multicast receiver to an existing shared multicast tree passing through the rendezvous point. When volume of traffic crosses a threshold, the receiver is joined to a source-specific tree, and the feed through the RP is dropped. You can think of this as obtaining copies of something through a friend who already subscribes, and when it proves useful or interesting, it's worth the bother to become a direct subscriber. So: the scalable way to do multicast is PIM Sparse Mode (PIM-SM), and PIM-SM requires that you have at least one RP.

How Many Ways to RP?

There are three ways I know of to set up a RP:

manual configuration in each leaf routers
auto-RP
bootstrap router with PIM version 2

There are actually more combinations and choices, but we don't have space and time for all of them. Nor is it really necessary, when you understand what is possible. See however the online Cisco document at http://www.cisco.com/warp/public/cc/pd/iosw/tech/rppim_rg.htm . It contains configurations for 10 different RP scenarios!

Manual Configuration of RP

This is simple but not very scalable, nor is it robust. Globally configure:

ip pim rp-address rp-address [access-list] [override] [bidir]

For example

ip pim rp-address 172.16.100.100

Supply the RP address on each router in the network. If you supply an access-list, it defines (permits) the multicast groups for this particular RP. Add "override" to over-ride Auto-RP if it is present (see below). If you're doing bidirectional PIM (new, advanced feature), then you can instead specify an access-list and RP for bi-PIM groups.

Auto-RP

Auto-RP automatically distributes information to routers as to what the RP address is for various multicast groups. It simplifies use of multiple RP's for different multicast group ranges. It avoids manual configuration inconsistencies, and allows for multiple RP's acting as backups to each other. Cisco routers automatically listen for this information. Auto-RP relies on a router designated as RP mapping agent. Potential RP's announce themselves to the mapping agent, and it resolves any conflicts. The mapping agent then sends out the multicast group-RP mapping information to the other routers.
How does it does this? It uses multicast to send the mapping information to the other routers! The specific groups used are 224.0.1.39 and .40. The first (.39) is used to advertise, the second (.40) is used for discovery. Of course, there's a chicken and egg problem there: how can you send out multicast information via multicast if the Auto-RP information is needed to make PIM-SM work in the first place?
Generally Auto-RP is used with sparse-dense mode, since then the Auto-RP information can be propagated in dense mode. If your routers are configured with pure sparse-mode on the interfaces, then you can shift to sparse-dense-mode. The other choice with PIM-SM only interfaces is to configure static RP addresses for the Auto-RP multicast groups (the multicast groups used by Auto-RP itself to communicate). That way, the static info gets the Auto-RP multicasts distributed in sparse mode, and then the Auto-RP mapping information allows the other multicast groups to be joined. By the way, you do not need to statically specify a group range for the Auto-RP multicast groups, since normally Auto-RP information takes priority over statically configured information. Thus group mappings advertised via Auto-RP will direct Joins to the correct RP, while the lack of this information for the Auto-RP groups means the statically configured RP for Auto-RP will remain in effect.
Routers that are RP's are configured with the global configuration command:

ip pim send-rp-announce type number scope ttl-value [group-list access-list] [interval seconds] [bidir]

The type number argument is the name of an interface providing the address for the RP. Scope is the TTL of the announcement (which limits how many router hops it can traverse). You specify which multicast groups the router is RP for with the access-list.
For example:

ip pim send-rp-announce loopback0 scope 16 group-list 10
access-list 10 permit 239.0.0.0 0.255.255.255

By default, such routers advertise themselves every 60 seconds to multicast group 224.0.1.39. They advertise their address, also the range of groups they are RP for. The mapping agents receive this information. They select the highest candidate RP address as RP for each group or range advertised. The mapping agents advertise this, by default every 60 seconds or when changes occur, to multicast group 224.0.1.40. You do have to configure mapping agents (so they know they're the mapping agent):

ip pim send-rp-discovery scope ttl-value

The scope is how many hops the advertisements can take. This allows you to have different mapping agents, each responsible for part of the network. (Some other configuration may also be required to optimize this.) Because the Auto-RP mapping agents use the highest RP for each group or range, you can have redundant RP's. If the one with the highest address fails, the next one will take over (after the cache hold time expires). If you have redundant Auto-RP mapping agents, as long as they advertise the same information, there is no problem. You do need to make sure candidate RP's use a large enough scope to reach all the mapping agents.
If you use administrative scoping (ttl settings), generally you set them on the large side, to make sure the reach every router within the local domain (part of the network). Think of the multicast advertisements as a wave, with scope being the height of the wave. You need to make sure the wave is big enough to reach the fringes of the domain. If you need to keep them from "spilling over" into another part of the network, you can use a boundary command:

interface serial 0
ip multicast boundary 10 access-list 10 deny 239.0.0.0 0.255.255.255
access-list 10 permit 224.0.0.0 15.255.255.255

This stops specified multicasts from crossing the boundary. You can also use a TTL threshold:

interface serial 0
ip multicast ttl-threshold ttl-value

Only the packets with TTL greater than the threshold are forwarded out the interface. Routers will act as RP if they receive Join or Prune messages. You can configure your routers to only accept prunes and joins in accord with the Auto-RP mapping information, with:

ip pim accept-rp auto-rp [access-list ]

The access list can be used to control which groups this applies to. You could also statically configure

ip pim accept-rp rp-address [access-list ]

for static RP's, but that gets painful to maintain.

Bootstrap Router

The Bootstrap Router (BSR) capability was added in PIM version 2. It automates and simplifies the Auto-RP process. It is enabled by default in Cisco IOS releases supporting PIMv2. There are interoperability and design issues with PIM v1. See the Configuration Guide for more advice on this. The short form of the advice is to set up your BSR to also be Auto-RP mapping agent, make sure all RP's run PIMv2, and then the PIM versions can interoperate. We'll assume you have upgraded your routers and all are running PIM v2. This means you'll have one active RP per multicast group, compared to several for the same group in PIMv1. You configure sparse-dense-mode on interfaces, since Sparse or Dense are now properties of a multicast group, not an interface.
PIMv1 plus Auto-RP does the same tasks as BSR. But Auto-RP is Cisco proprietary, whereas PIMv2 with BSR is an IETF standards track protocol, which means it should interoperate with routers from other vendors.
To use Bootstrap Router, configure one or more candidate BSR's. These should be well-placed, in the core of your network with good connectivity. Configuration command:

ip pim bsr-candidate type number hash-mask-length [priority]

The type number part of this refers to the interface whose address is used to identify the BSR. The hash-mask-length is how many bits of a multicast group address to use before consulting a hash table of RP's. The priority is for election as BSR. The hashing allows load balancing across multiple RP's for a range of groups. Only one RP will be used for each group, but the hashing will divide up which RP is used for which group. The hashing scheme is deterministic, so that all routers will use the same scheme and determine the same RP for each group. You also have to configure one or more candidate RPs, as with Auto-RP. RP's should also be well-connected and in a high-speed and accessible portion of the network.

ip pim rp-candidate type number [group-list access-list] [bidir]

The arguments are identical to those for the send-rp-announce command arguments above. (Interface for RP identity, access list controlling which multicast groups the router is to be an RP candidate for.) The actual operation of BSR is a bit different than Auto-RP. First, a single BSR is elected, based on configured priority. (Highest IP address is used as a tie-breaker.) Candidate RP's then unicast announcements to this BSR, which stores all of the announcements. The BSR periodically floods BSR messages to all the other routers, hop by hop. The flooding is to 224.0.0.13 (all PIM routers) with TTL one. (All 224.0.0.x multicasts are link-local in scope.) Default flooding interval is 60 seconds. If a candidate BSR does not receive a BSR message within 150 seconds, it starts an election. It starts announcing itself until a BSR message with a higher priority is received.
To set up BSR domains, you need to stop BSR messages from going between the domains. This is done simply, via an interface command:

interface serial 0
ip pim bsr-border

This causes the interface to neither send nor receive BSR messages on that interface. This is much simpler than TTL scoping!

Show Commands

The following show commands may be useful to you in working with RP's and BSR's.

show ip pim rp [mapping | metric] [rp-address]
show ip pim rp-hash [group-address | group-name]
show ip pim bsr

Increasing Redundancy

Static RP allows you to only have 1 RP per group. Forget redundancy with static RP's. Auto-RP allows you to have multiple RP's per group. Only one is used at a time, however. You can also have multiple auto-RP mapping agents, as long as they provide consistent information.
BSR's can be redundantly configured, although only one will be active at any time. You can configure multiple candidate RP's per group range, and the hashing scheme will distribute load across the candidates deterministically. Each group can still only be served by one RP.
PIM RP's have a built-in limitation, which is why all the above approaches only allow one RP per multicast group. The limitation is that when a source for a group starts transmitting, the adjacent edge router has to send unicast PIM Register packets to an RP. If you have multiple RP's, only one could receive the Register packets and learn the source address (and Join towards the source, if there are receivers). The other RP's for the group would remain unaware of the source.
If you configure and run MSDP, Multicast Source Discovery Protocol, it allows source information to be shared among RP's. Thus, when one RP learns of a source for a multicast group, it can pass this information to MSDP neighbors, other RP's. The MSDP neighbors can flood the source advertisement to all RP's in the network.
There's a nice trick that can be used with this. Configure loopbacks on the MSDP-speaking RP's. Configure the loopback interfaces with duplicate IP addresses. Weird! The one place in networking I know of where duplicate IP addresses are actually considered desirable!
Consider what happens if static or Auto-RP or BSR are used with RP's based on the duplicated IP address. When an edge router needs to send a Join, it unicasts it to the duplicated address. When you have a duplicated address, packets go to the closest copy of the address. So Join packets find the nearest RP. If it happens to fail, then packets will just go to another copy of the RP. Since the RP is only used temporarily, followed by a Source-specific Join, should the RP fail, only new Joins will be affected. (Assuming the source-specific threshold isn't set to a high value.) This behavior is called anycast. (As in, get me the packets from anywhere.)
The final degree of flexibility and redundancy for multicast is gained through anycast. It can be added to any of the above techniques for advertising RP's.

Conclusion

For more specific configuration examples, also example of anycast, see the Cisco configuration paper already referenced, at http://www.cisco.com/warp/public/cc/pd/iosw/tech/rppim_rg.htm . As usual, the Configuration Guide is a pretty good resource. The URL follows, as does the URL leading to the 3 relevant sections of the Configuration Reference.

The next article may talk briefly about MBGP, MSDP, bidirectional PIM, and/or source-specific multicast. Or it may not. This subject matter is getting a bit advanced, so we may shift topics to something of broader interest. As always, your emails are welcome. Questions, suggestions for articles, etc. can be sent to pjw@netcraftsmen.net .

Dr. Peter J. Welcher (CCIE #1773, CCSI #94014) is a Senior Consultant with Chesapeake NetCraftsmen. NetCraftsmen is a high-end consulting firm and Cisco Premier Partner dedicated to quality consulting and knowledge transfer. NetCraftsmen has eleven CCIE's (4 of whom are double-CCIE's, R&S and Security). NetCraftsmen has expertise including large network high-availability routing/switching and design, VoIP, QoS, MPLS, network management, security, IP multicast, and other areas. See http://www.netcraftsmen.net for more information about NetCraftsmen. Pete's links start at http://www.netcraftsmen.net/about-us/bios/staff-articles-and-blogs/pete-welcher.html . New articles will be posted under the Articles link. Questions, suggestions for articles, etc. can be sent to pjw@netcraftsmen.net .
Corrections made 1/4/04, from Scott Morris: PIM routers multicast is 224.0.0.13. Note that 224.0.0.x multicasts are link local.
12/5/2001
Copyright (C) 2001, Peter J. Welcher

Monday, January 16, 2012

IP Multicast and PIM Rendezvous Points