What is Failover?

Failover refers to switching to a computer, system, network, or hardware component that is on standby if the initial system or component fails. It is a state under which the system operates and is achieved when a redundant component kicks in or the system moves into a standby operational mode. Failover is designed to cut down on or completely eliminate the impact on users in the event of a failure. 

Switchover is very similar to failover. The one exception is that switchover requires human intervention to initiate the transition.

In an effective system, the infrastructure is set up to allow for seamless failover implementation. In some systems, failover is a helpful option, but in others, it is an absolute necessity. For example, if a network requires a disaster recovery plan in place, a failover system is a mandatory part of the infrastructure—even if it primarily consists of relatively basic measures like backup techniques. However, virtually any organization can benefit from failover because it enables them to maintain continuity.

 

What is a Failover Cluster?

A failover cluster refers to a combination of servers that work in unison to facilitate either continuous availability (CA) or high availability (HA). In case one of the cluster components fails, another component takes on its workload. This is accomplished with very little or no downtime.

“What is failover” is a different question than “what is a failover cluster,” however. A common failover meaning is redundancy built into the computing system, while clustering is a method of achieving failover. A cluster is built by incorporating a second server on a separate computer that is able to manage some of the processing. If the primary server fails in some way, the secondary server can take over completely.

Failover Configurations

With servers, failover frequently uses what is referred to as a heartbeat system. This works by connecting two servers using either a wireless or cable connection. The system monitors the connection, looking for a “pulse” between the two servers. If the pulse is there, the second server remains offline.

In many cases, the system could incorporate an additional—or third—server that runs the essential components to avoid downtime while the switch happens. The heartbeat link between both servers ensures the failover server is ready to go in the event of a failure. This setup can be supported by a combination of on-premises and cloud-based support mechanisms to reduce the possibility of delays or problematic continuity breaks during the switching process.

While failover is often performed automatically, some configurations involve alerting the IT team to the need to perform a switchover. In this setup, the system alerts the administrator automatically, but the actual switch does not happen until the administrator approves it.

With more and more virtualization software available, failover is often possible without using physical devices. To enable a virtualized failover, the active virtual machine is moved from the primary to the secondary host. In this way, service continues without interruption.

 

Active-active vs. Active-Standby Configurations

Active-active and active-standby (or active-passive) are the most popular HA cluster configurations. They each achieve failover in unique ways.

With an active-active cluster, you typically have at least two nodes that run the same kind of service simultaneously. This way, an active-active cluster is able to load balance the system, so a single node does not get overloaded. In addition to sharing the workload, this configuration also enhances throughput and response times because more nodes are online at the same time. To maximize the effectiveness of an active-active system, both nodes should be exactly the same. This provides true redundancy and allows for a completely continuous HA cluster.

An active-standby system is different in that the secondary node is not running at the same time as the primary one. When a failover condition is necessary, the secondary node is activated. With an active-standby cluster, the servers also have to be configured with identical settings because this reduces latency when the passive node is activated.

The system requires time to switch from one node to another in an active-standby setup, so there may be some outage during the transition. With an active-active system, the switch is instantaneous.

What is DHCP Failover?

A Dynamic Host Configuration Protocol (DHCP) server uses DHCP to respond to queries. The network server automatically assigns and provides Internet Protocol (IP) addresses, gateways, and other network parameters to the devices connected. DHCP failover refers to when you use two or more DHCP servers to manage a pool of addresses. This makes it possible for the individual DHCP servers to back up each other if there is an outage in the network. In addition, the individual servers are able to split the task of assigning leases to all devices connected to the network on a continuous basis.

The communication that happens during failover is inherently insecure, however. This could present an opportunity for a hacker to create a condition necessitating failover, with the intention of stealing or accessing information during failover. To ensure security, use a firewall. This can prevent unauthorized access of the failover port by users or devices. Security in a DHCP failover system should, ideally, be an integral part of your network topology.

 

Why Failover is Important

Network architects create failover systems to enhance reliability. Whenever there is an interruption of operations, the costs, both in terms of time and frustration, can have a negative ripple effect across an organization and the users it serves. Failover can eliminate or reduce breaks in continuity, helping an organization remain solvent—even during a significant disaster. Specifically, failover allows you to:

  1. Keep your database protected when the system needs to undergo maintenance or if it fails.
  2. Run maintenance tasks automatically. With an automated failover protocol in place, you do not need to supervise the switchover when performing maintenance such as software updates or security enhancements.
  3. Customize your failover system to fit the needs of your network and the devices on it. As you maintain a database, you can have multiple systems working at the same time, each supporting the other to prevent overall system failure. You also have the option of incorporating cloud servers. These can take over operations completely during maintenance or upgrades, and you do not have to worry about problems with connectivity during the process.

How Fortinet Can Help

Consistent, reliable connectivity is essential for the seamless operation of any organization. The Fortinet FortiExtender solution enables you to use your wireless carrier’s Long-Term Evolution (LTE) connectivity to supply broadband speeds for a secure, backup internet connection. As a result, if there is a failure, you still maintain internet connectivity. Because the system can provide broadband speeds, it allows seamless, organizationwide connectivity even if your primary connection goes down.

In addition to keeping you connected to the internet, you can use FortiExtender as a failover wide-area network (WAN) link. If there is an outage in your primary WAN, FortiExtender can keep the devices on your WAN connected so operations continue uninterrupted.

FAQs

What is a failover in networking?

In networking, failover refers to a backup mode for operating the network. If the primary network fails, the system switches automatically to another network that is sitting on standby.

What is the difference between HA and Failover?

High availability (HA) is a more general term that connotes a system that can tolerate failure. On the other hand, failover is a specific feature that gives a system high availability. Because failover involves having a backup system at the ready at all times, failover, in effect, enables HA.

How do you perform a failover test?

To perform a failover test, take the following steps:

  1. Analyze the system and your needs, and use that information to set up performance benchmarks you expect the system to meet in the event failover is necessary.
  2. Set up a plan for your test. Ideally, you want to arrange for several tests—one for each of a variety of situations.
  3. Run each test, taking note of how the system performed.
  4. Analyze the performance data and compare it to the benchmarks in step 1.