Fault tolerance within the Hybrid Cloud Gateway cluster

Last Updated : Jan 22, 2024 |

To support N+1 redundancy, deploy Avaya Hybrid Cloud Gateway with multiple all-in-one nodes within a data center cluster. One of the nodes is auto-elected as a seed node, whereas other nodes become non-seed nodes.

The Hybrid Cloud Gateway (HCG) Signaling Connector becomes active-active load-balanced among all nodes.

Number

HCG Signaling Connector recovery

HCG Admin Services recovery

HCG Signaling Connector return to regular

HCG Admin Services return to regular

1

Scenario: WebSocket from one HCG node to UC3 is not functional.

Impact: No impact on existing or new calls.

HCG Signaling Connector failed node can use the healthy nodes to communicate with UC3.

N/A

When the WebSocket is functional on the node, the node uses its own web socket to communicate with UC3.

N/A

Scenario: WebSocket from the HCG seed node to HCG Manager is not functional.

Impact: During the outage, you cannot save the changes in tenant admin configuration related to HCG. The cloud operations cannot remotely access the HCG cluster.

N/A

The Admin Services are active on all nodes with active-active load-balancing. The Hybrid Cloud Manager (HCM) service can distribute the commands to any healthy WSS connections to connect to the admin services on the node.

N/A

When the WebSocket connection is functional, HCM uses it in load distribution.

Scenario: A non-seed node failure within the Hybrid Cloud Gateway N+1 cluster.

Impact: All agents on this node lose the SIP signaling connections to Avaya Aura®. The ongoing calls are in the media preservation mode.

Note:
  • Only End Call and Mute buttons are supported.

  • No new calls can be made or received before the recovery.

The HCG Signaling Connector in other nodes within the same cluster detects the failure of this node.

The detection of the outage is within 60 seconds.

If the node is functional within 60 seconds, expect the full recovery of the existing call control.

Otherwise, it takes a few minutes to register all affected SIP registrations.

The Admin Services are active on all nodes with active-active load-balancing. The HCM service distributes the commands to any healthy WSS connections to connect to the Admin Services on the node.

If the failed node becomes functional, it takes new SIP registration requests. The existing SIP registrations in other nodes remain unchanged.

When the WebSocket connection is functional, HCM uses it in load distribution.

2

Scenario: A seed node failure within the Hybrid Cloud Gateway N+1 cluster.

Impact: All agents on this node lose the SIP signaling connections to Avaya Aura®. The ongoing calls are in the media preservation mode.

Note:
  • No new calls can be made or received before the recovery.

  • Another seed node typically takes up to 90 seconds to be elected. The Admin Connector and the Admin Services are not functional.

The HCG Signaling Connector in other nodes within the same cluster detects the failure of this node.

A new seed node is elected.

The Admin Services are active on all nodes with active-active load-balancing. The HCM service distributes the commands to any healthy WSS connections to connect to the Admin Services on the node.

If the failed node becomes functional, it is treated as active mode, and the existing seed node remains unchanged.

When the WebSocket connection is functional, HCM uses it in load distribution.

3

Scenario: Avaya Aura® Session Manager-1 failover to Session Manager-2.

Impact: All on-going calls connected through Session Manager-1 are in the media preservation mode.

Agents can continue to make or receive new calls through Session Manager-2.

As HCG Signaling Connector performs simultaneous SIP registration to both Session Manager-1 and Session Manager-2, the failover is seamless to Avaya Workspaces.

N/A

N/A

N/A

4

Scenario: Avaya Aura® Communication Manager-1 failover to Survivable Server Core/LSP-1.

Impact: All existing calls connected through Communication Manager-1 will be in the media preservation mode.

An agent can continue to make or receive new calls.

Agent must click Start Work again to receive ACD calls. When Survivable Server Core/LSP-1 is functional, HCG Signaling Connector receives the signaling for changing the agent state as Logged Out/Finish Work.

No impact.

N/A

N/A

N/A

5

Scenario: Duplex Communication Manager outage.

Impact: No impact if you have configured the virtual IP address for the duplex Communication Manager.

No impact.

N/A

N/A

N/A

6

Scenario: Avaya Aura® System Manager failover from active to standby.

Impact: No impact.

N/A

N/A

N/A

N/A