IPsec High Availability Options
Foundation Topics Sources of Failures
The network has a number of possible points vulnerable to failure. Remember that an IPsec VPN is an end-to-end connection. It typically travels across untrusted networks (such as the Internet), and through many different network devices. The loss of any one of these components can cause the IPsec VPN to fail. Such potential failure points include ■ Access link failure ■ Remote peer failure ■ Device failure ■ Path failure An access link failure could include the failure of a physical interface on any transit network device (although the access link is typically seen at your end of the IPsec VPN), a module that contains many interfaces, or the “cable” (electrical, optical, or wireless) that provides transport. Failure of the remote peer is typically attributed to “the other guy.” Unless you have some network management reachability into the remote site, it is difficult to determine what the exact cause of the failure is. A device failure is typically a failure of any device between, and including, the source and destination of the IPsec VPN. In many cases, these devices are beyond your administrative control, and the reason for failure cannot be determined. A path failure could be a routing or circuit issue in a network between the two IPsec VPN endpoints. The failure is typically outside of your administrative reach, and cannot be easily determined. The IPsec VPN design must consider all facets of potential network failure and implement redundancy accordingly to ensure that the secure traffic continues to flow from one site to another. Failure Mitigation Each of the failure sources mentioned earlier can be mitigated by employing one or more redundancy mechanisms.
Failover Strategies
in the network, the greater the implementation cost. The primary failure points and some preventive solutions are as follows: ■ Access link failure—To overcome the loss of an access link, multiple interfaces and devices can be used. A single IPsec VPN endpoint could have multiple interfaces, multiple interface cards, or multiple endpoint devices. ■ Remote peer failure—Failure of the remote peer is mitigated in a similar manner to the way in which access link failure is mitigated. Multiple interfaces and devices can be used to survive a failure. Such duplication allows multiple IPsec VPN tunnels to securely connect the two sites, and each uses a different infrastructure. ■ Device failure—Device failure comes in many flavors. As described for both the access link and remote peer, duplicate interfaces and devices can help overcome a local failure. However, a device failure outside of your administrative control is a challenge to correct. So rather than fix someone else’s equipment, simply avoid it. Ensure that you have multiple diverse paths between endpoints in case a problem arises in the untrusted network. ■ Path failure—A path failure is typically beyond your control. Path redundancy can be used to circumvent a path failure in an untrusted network. It is important to consider what is truly required to achieve path redundancy. Any single point of failure should be removed from the path. Within your network, this would mean duplicate equipment and wiring. It would also imply separate and diverse paths into and out of the building. Many costly redundancy plans have been knocked out with a single swipe of a backhoe cutting the single physical path into the building. The use of different ISPs ensures that the traffic starts in different pieces of the Internet. But it is difficult to ensure that a common circuit (from an upstream ISP) is not used “somewhere” between the source and destination points. Failover Strategies The best redundancy plans cannot be executed if the failure state cannot be recognized. There are two ways that IPsec failover can be executed: ■ Stateless—In a stateless environment, redundant logical connections (IPsec VPN tunnels) are used to provide primary and backup paths. The use of the paths is determined by message exchanges between the peers, or a determination by the end devices on which path to use. The state of the IPsec VPN tunnels is not known. Traffic is sent across the backup tunnel if the end-to-end path has failed. 150x01x.book Page 359 Monday, June 18, 2007 8:52 AM 360 Chapter 15: IPsec High Availability Options ■ Stateful—To provide a stateful failover, redundant equipment is employed. The devices used to provide stateful failover are typically identical (configuration, interfaces, operating system, and so on). These devices also communicate with each other to determine which one is the current best device. Most redundancy plans react to a failure and send traffic on an alternate path. The overall ability to provide timely redundancy begins with the detection of a failure.
IPsec Stateless Failover
There are three primary stateless means to detect and react to a fault. The ideal reaction to a detected fault is to automatically send traffic a different way. The three failure detection methods are as follows: ■ Dead peer detection (DPD) ■ An IGP within GRE over IPsec ■ Hot Standby Routing Protocol (HSRP) (or one of the related protocols) The sections that follow discuss each of these methods in greater detail. Dead Peer Detection Dead peer detection is a configuration option during the IPsec VPN setup. DPD also offers a stateless failover from one VPN tunnel to another. This means that the routers are not keeping track of which VPN tunnels are currently alive. Instead, traffic flows through the primary tunnel until it fails, at which time a secondary tunnel is selected. DPD has two operational modes: periodic mode and on-demand mode. DPD periodic mode has the following characteristics: ■ DPD keepalive messages are periodically sent between IPsec VPN peers. ■ DPD keepalive messages are in addition to the normal IPsec rekey messages that also regularly traverse the tunnel. ■ DPD keepalive messages are not sent if user data is transmitted through the VPN tunnel. ■ DPD keepalive messages are used only when there is a lull in tunnel traffic. One negative consequence of periodic DPD mode is the potentially excessive tunnel overhead. IKE already has a regular set of keepalive messages that pass through the tunnel. This keepalive mechanism is the IPsec SA rekeying messages that occur as the IPsec lifetime nears expiration. 150x01x.book Page 360 Monday, June 18, 2007 8:52 AM Failover Strategies 361 Use of an IPsec VPN tunnel normally means that packets are encrypted at one end and decrypted at the other. The addition of DPD keepalive messages adds more encryption/decryption overhead to the VPN endpoints. However, the addition of these DPD keepalive messages provides more timely failure detection. In contrast, DPD on-demand mode has the following attributes: ■ It is the default DPD mode in a Cisco IOS device. ■ DPD keepalive messages are sent only if the liveliness of the remote peer is in question. If traffic is sent to the peer, a response is expected. If such a response does not arrive, a DPD keepalive message is sent. ■ DPD keepalive messages are never sent during otherwise idle tunnel moments. ■ It is possible that a router might not discover a dead peer until the IKE or IPsec security association (SA) rekey is attempted. The use of on-demand mode reduces the additional tunnel overhead that normal mode introduced. However, an alternate IPsec VPN tunnel might not be used immediately upon the failure of the primary one. This is not as bad as it may sound. If there is no traffic traveling through an IPsec VPN, and the VPN fails, there truly is no need to change to the alternate tunnel until user data arrives. The configuration of DPD in a Cisco IOS device is simply a modification of an existing IPsec VPN setup. As already discussed, DPD uses keepalive messages to determine if the primary peer has failed, and then swaps over to a backup peer. Figure 15-1 shows a sample DPD configuration and topology. Figure 15-1 shows how a remote site is configured with redundant IPsec VPN tunnels back to a central office using DPD. The two Cisco IOS commands that enable DPD are crypto isakmp keepalive seconds [retries] [periodic | on-demand] set peer ip-address [default] The crypto isakmp keepalive IOS command determines the mode and frequency of DPD. Remember that periodic mode sends DPD keepalive messages, which are continually sent to verify that the remote VPN peer is still alive. The default DPD mode is on-demand, which sends DPD messages only if the remote peer is believed to be dead. Default options do not appear in the configuration. The crypto isakmp keepalive command has two timer options. The seconds option defines how often DPD keepalive messages are sent in periodic mode. The retries option defines how long to wait to resend DPD messages after the previous one has failed.