A while back, a technical acquaintance requested some input on a particular problem that they had with failing over dual ISP's. The solution they had in place was fine back in the day when downtime measured in minutes was acceptable, but in todays world, a manual failover solution just doesn't cut the mustard. At my disposal, I had an ageing Cisco 2811 with an ADSL 1WIC, an ADSL line and a VDSL line (BT Infinity) with an accompanying Huawei modem.
The aim was to achieve a system that was as highly available as possible with the limited equipment to hand, and I set about configuring the networks as shown below;
The ADSL and VDSL were business lines, but only had one static IP each, which posed a problem. I had planned to perform NAT at the firewall so that any changes could be done in a GUI by the local support team, but this would have meant I could not use ICMP echo replys as my reachability tracking method. To get around this you must perform the port address translation on the router rather than the firewall, or get yourself some extra IP's. Tracking interfaces wasn't an option here either, as the external modem used for the VDSL line would have provided us zero visability from the 2811 for the service status.
The actual mechanism by which you fail over in my implementation is by IP SLA. These are a set of configurable options that the routers IOS will log if breached, and in turn are tracked with a track object. The track object in this instance is placed against the default route, and the secondary route would then come into play in the event of the track object returning a "Down" status code. The commands are fairly straight forward, but be aware that later revisions after IOS 12 require different terminology;
ip sla 100
icmp-echo x.x.x.x source-interface Dialer1 (Where x.x.x.x = Primary ISP next hop)
ip sla schedule 100 life forever start-time now
track 100 rtr 100 reachability
ip route 0.0.0.0 0.0.0.0 Dialer1 track 100
ip route 0.0.0.0 0.0.0.0 Dialer2 10
The next problem I came accross was that even if you could fail over the route, the existing NAT translations needed clearing in order for anyone to access the internet again after a failover, which is achieved with the event manager applet, which i configured as shown below;
event manager applet NAT-TRACK
event track 100 state any
action 0.1 cli command "enable"
action 0.2 cli command "event timer countdown time 20"
action 0.3 cli command "clear ip nat translation force"
action 0.4 syslog msg "NAT translations cleared after track state change"
To conclude, it is of course possible to achieve better redundancy with more equipment allowing the use of HSRP, VRRP and GLBP, and perhaps even redundancy of inbound services would have been possible with the use of BGP with a pair of ISP's that would have supported it by advertising an owned range from each ISP, but this was never in the budget. The interesting part in this is that a year or so later my acquaintances customer no longer has any perceivable internet downtime, but the syslog messages reveal that there have been outages from time to time and that the fail overs have been seemless. I hope this has been informative for you.