It is possible for a set of clustered SAS Metadata Servers to lose quorum because of transient network failures. The same problem might occur for load balanced Object Spawners or OLAP Servers as well. The issue is that the nodes in the cluster do not reconnect to each other without manual intervention.
A review of the Object Spawner logs shows messages such as these occurring:
Load Balancing interface call failed with exception <?xml version="1.0" ?> <Exceptions><Exception><SASMessage severity="Error">The outcall request did not complete in the time allotted.</SASMessage></Exception></Exceptions>.
A review of the SAS Metadata Server logs for a cluster member shows messages such as these occurring when the quorum was lost:
2014-11-30T22:02:02,345 INFO [01161947] :sas - The Bridge Protocol Engine Socket Access Method lost contact with a peer (857826) during protocol recognition. 2014-11-30T22:02:04,889 INFO [00000015] :sas - Client connection 11 for user sas closed. 2014-11-30T22:02:04,889 WARN [00000012] :sas - Load Balancing interface call failed with exception <?xml version="1.0" ?> <Exceptions><Exception><SASMessage severity="Error">The outcall request did not complete in the time allotted. </SASMessage></Exception></Exceptions>. 2014-11-30T22:02:04,893 WARN [00000012] :sas - Lost contact with the server when calling an IOM interface. 2014-11-30T22:02:04,893 INFO [00000012] :sas - Disconnecting server SASMeta - Metadata Server Node 3 from cluster SASMeta - Logical Metadata Server. 2014-11-30T22:02:04,894 WARN [00000012] :sas - Lost contact with the server when calling an IOM interface. 2014-11-30T22:02:04,895 INFO [00000012] :sas - Setting the master node to SASMeta - Metadata Server Node 3. 2014-11-30T22:02:04,895 INFO [01161949] :sas - The cluster has lost quorum and is now OFFLINE.
2014-11-30T22:02:06,412 WARN [00000012] :sas - Load Balancing interface call failed with exception <?xml version="1.0" ?><Exceptions><Exception><SASMessage severity="Error">The outcall request did not complete in the time allotted.</SASMessage></Exception></Exceptions>. 2014-11-30T22:02:06,413 WARN [00000012] :sas - <?xml version="1.0" ?><Exceptions></Exceptions> 2014-11-30T22:02:06,413 INFO [00000012] :sas - Disconnecting server SASMeta - Metadata Server Node 3 from cluster SASMeta - Logical Metadata Server. 2014-11-30T22:02:06,414 INFO [00000015] :sas - Client connection 580705 for user sas closed. 2014-11-30T22:02:06,611 WARN [00000012] :sas - Lost contact with the server when calling an IOM interface. 2014-11-30T22:02:06,633 INFO [00000012] :sas - Setting the master node to SASMeta - Metadata Server Node 3. 2014-11-30T22:02:06,634 INFO [00901546] :sas - The cluster has lost quorum and is now OFFLINE.
2014-11-30T22:03:06,151 WARN [00000012] :sas - Load Balancing interface call failed with exception <?xml version="1.0" ?><Exceptions><Exception><SASMessage severity="Error">Invalid object specified to bridge protocol engine.</SASMessage></Exception></Exceptions>. 2014-11-30T22:03:06,152 WARN [00000012] :sas - Lost contact with the server when calling an IOM interface. 2014-11-30T22:03:06,152 INFO [00000012] :sas - Disconnecting server SASMeta - Metadata Server from cluster SASMeta - Logical Metadata Server. 2014-11-30T22:03:06,153 INFO [02088282] :sas - The cluster has lost quorum and is now OFFLINE.
If the cluster enters the state in which the quorum is lost, follow these steps to restore the quorum:
Click the Hot Fix tab in this note to access the hot fix for this issue.
Product Family | Product | System | Product Release | SAS Release | ||
Reported | Fixed* | Reported | Fixed* | |||
SAS System | SAS Metadata Server | Microsoft® Windows® for x64 | 9.4 | 9.4_M3 | 9.4 TS1M0 | 9.4 TS1M3 |
Microsoft Windows 8 Enterprise x64 | 9.4 | 9.4_M3 | 9.4 TS1M0 | 9.4 TS1M3 | ||
Microsoft Windows 8 Pro x64 | 9.4 | 9.4_M3 | 9.4 TS1M0 | 9.4 TS1M3 | ||
Microsoft Windows 8.1 Enterprise 32-bit | 9.4 | 9.4_M3 | 9.4 TS1M0 | 9.4 TS1M3 | ||
Microsoft Windows 8.1 Enterprise x64 | 9.4 | 9.4_M3 | 9.4 TS1M0 | 9.4 TS1M3 | ||
Microsoft Windows 8.1 Pro | 9.4 | 9.4_M3 | 9.4 TS1M0 | 9.4 TS1M3 | ||
Microsoft Windows 8.1 Pro 32-bit | 9.4 | 9.4_M3 | 9.4 TS1M0 | 9.4 TS1M3 | ||
Microsoft Windows Server 2008 R2 | 9.4 | 9.4_M3 | 9.4 TS1M0 | 9.4 TS1M3 | ||
Microsoft Windows Server 2008 for x64 | 9.4 | 9.4_M3 | 9.4 TS1M0 | 9.4 TS1M3 | ||
Microsoft Windows Server 2012 Datacenter | 9.4 | 9.4_M3 | 9.4 TS1M0 | 9.4 TS1M3 | ||
Microsoft Windows Server 2012 R2 Datacenter | 9.4 | 9.4_M3 | 9.4 TS1M0 | 9.4 TS1M3 | ||
Microsoft Windows Server 2012 R2 Std | 9.4 | 9.4_M3 | 9.4 TS1M0 | 9.4 TS1M3 | ||
Microsoft Windows Server 2012 Std | 9.4 | 9.4_M3 | 9.4 TS1M0 | 9.4 TS1M3 | ||
Windows 7 Enterprise x64 | 9.4 | 9.4_M3 | 9.4 TS1M0 | 9.4 TS1M3 | ||
Windows 7 Professional x64 | 9.4 | 9.4_M3 | 9.4 TS1M0 | 9.4 TS1M3 | ||
64-bit Enabled AIX | 9.4 | 9.4_M3 | 9.4 TS1M0 | 9.4 TS1M3 | ||
64-bit Enabled Solaris | 9.4 | 9.4_M3 | 9.4 TS1M0 | 9.4 TS1M3 | ||
HP-UX IPF | 9.4 | 9.4_M3 | 9.4 TS1M0 | 9.4 TS1M3 | ||
Linux for x64 | 9.4 | 9.4_M3 | 9.4 TS1M0 | 9.4 TS1M3 | ||
Solaris for x64 | 9.4 | 9.4_M3 | 9.4 TS1M0 | 9.4 TS1M3 |
A fix for this issue for Base SAS 9.4_M2 is available at:
https://tshf.sas.com/techsup/download/hotfix/HF2/R19.html#55021A fix for this issue for Base SAS 9.4_M1 is available at:
https://tshf.sas.com/techsup/download/hotfix/HF2/M88.html#55021Type: | Problem Note |
Priority: | medium |
Date Modified: | 2015-01-29 12:42:36 |
Date Created: | 2015-01-19 14:19:43 |