High Availability failovers

Last Updated : May 31, 2023 |

High Availability (HA) support for both media and signaling ensures that Avaya SBC security functionality is provided continuously, regardless of hardware or software failures. High availability requires minimum two Avaya SBC devices and one standalone EMS server.

Any Avaya SBC in the pair can be the primary Avaya SBC. The primary and secondary Avaya SBCs exchange HA control messages and heartbeat messages. When the primary Avaya SBC fails, the secondary Avaya SBC takes over and begins serving traffic.

High availability requires Gratuitous Address Resolution Protocol (GARP) support on the connected network elements. When the primary Avaya SBC fails over, the secondary Avaya SBC broadcasts a GARP message to announce that the secondary Avaya SBC is now receiving requests. The GARP message announces that a new MAC address is associated with the Avaya SBC IP address. Devices that do not support GARP must be on a different subnet with a GARP-aware router or L3 switch to avoid direct communication. For example, to handle GARP, branch gateways, Medpro, Crossfire, and some PBXs/IVRs must be deployed in a different network from Avaya SBC, with a router or L3 switch. If you do not put the Avaya SBC interfaces on a different subnet, after failover, active calls will have a one-way audio. Devices that do not support GARP continue sending calls to the original primary Avaya SBC.

All IP addresses configured in the Network Configuration screen are shared between both HA devices in HA deployment mode. The HA devices are also configured with private, default IPs which are used to replicate signaling and media data between each other. The configured interfaces will be inoperative on the stand-by (secondary) device until it becomes active (primary). When the devices switch, the new active device sends a GARP message to update the adjacent ARP tables so that they start receiving traffic.

Failover scenarios

Keep alive or heartbeat failure: The secondary Avaya SBC sends a keep alive request or heartbeat every 500 ms, and the primary Avaya SBC responds with a keep alive response. If the primary Avaya SBC does not respond to two consecutive keep alive requests, the secondary Avaya SBC takes over as the primary Avaya SBC.

Peer node unavailable: If a peer node is unavailable, the currently active or running Avaya SBC becomes the primary Avaya SBC. The active Avaya SBC attempts connecting with the peer every 15 seconds.

Link failures: The HA module has a list of physical ports and the status of the ports. The HA module gets the configured ports from the physical ports configured in the server and the subscriber flows. During a link failure, the primary Avaya SBC compares its active links with the number of active links for the peer Avaya SBC. When the primary Avaya SBC detects that the secondary Avaya SBC has more active links, the secondary Avaya SBC takes over as the new primary Avaya SBC. Failovers are not initiated for M1 and M2 link failures.
Note:

Avaya SBC compares the number of active links with the peer to determine whether a failover is necessary. For example, when one link from the primary Avaya SBC is down, but the secondary Avaya SBC also has the same number of links active, failover is not required.

Avaya SBC provides duplicate HA connection by using HA pair management addresses. With HA replication, if any M2 to M2 or M1 to M1 connections are down, the HA connection continues uninterrupted.

Avaya SBC supports an EMS HA active/active configuration. If the EMS hardware fails, the EMS server can switch to the other EMS in the HA pair without manual intervention.

Split brain: When both SBCs in HA pair loose the communication between them, then declare themselves as primary. This condition is called split brain. In split brain condition, Avaya SBC handles the calls on the network state:
  • fully functional no disruption to an going call

  • partial outage, some users/calls are affected

  • complete outage

In split brain state, Avaya SBC does not preserve active calls.

For example, SBC1 is a primary SBC and SBC2 is a secondary SBC. When you disconnect all the interfaces of SBC1:
  • SBC assumes that network is down.

  • SBC2 losses connectivity with SBC1 and SBC2 assumes role of primary SBC and failovers all active calls.

  • SBC1 continues to stay in the primary state with all interfaces down.

  • When interfaces are connected back, SBC1 and SBC2 establishes HA connection with split brain condition.

  • You must restart SBC1 and SBC2 to recover from split brain condition. When you restart SBC, you cannot place a new call and SBC drops all active calls. After the restart, SBC comes to Primary Secondary states and start serving new calls.