SUPPORT / SAMPLES & SAS NOTES
 

Support

Problem Note 55021: Clustered SAS® Metadata Servers lose their quorum status

DetailsHotfixAboutRate It

It is possible for a set of clustered SAS Metadata Servers to lose quorum because of transient network failures. The same problem might occur for load balanced Object Spawners or OLAP Servers as well. The issue is that the nodes in the cluster do not reconnect to each other without manual intervention.

Object Spawner

A review of the Object Spawner logs shows messages such as these occurring:

Load Balancing interface call failed with exception <?xml version="1.0" ?>
<Exceptions><Exception><SASMessage severity="Error">The outcall request did not 
complete in the time allotted.</SASMessage></Exception></Exceptions>.

Metadata Server

A review of the SAS Metadata Server logs for a cluster member shows messages such as these occurring when the quorum was lost:

Node 1:

2014-11-30T22:02:02,345 INFO  [01161947] :sas - The Bridge Protocol Engine Socket Access Method lost contact with a 
peer (857826) during protocol recognition. 
2014-11-30T22:02:04,889 INFO  [00000015] :sas - Client connection 11 for user sas closed. 
2014-11-30T22:02:04,889 WARN  [00000012] :sas - Load Balancing interface call failed with exception <?xml version="1.0" ?>
<Exceptions><Exception><SASMessage severity="Error">The outcall request did not complete in the time allotted.
</SASMessage></Exception></Exceptions>. 
2014-11-30T22:02:04,893 WARN  [00000012] :sas - Lost contact with the server when calling an IOM interface. 
2014-11-30T22:02:04,893 INFO  [00000012] :sas - Disconnecting server SASMeta - Metadata Server Node 3 
from cluster SASMeta - Logical Metadata Server. 
2014-11-30T22:02:04,894 WARN  [00000012] :sas - Lost contact with the server when calling an IOM interface. 
2014-11-30T22:02:04,895 INFO  [00000012] :sas - Setting the master node to SASMeta - Metadata Server Node 3. 
2014-11-30T22:02:04,895 INFO  [01161949] :sas - The cluster has lost quorum and is now OFFLINE.

Node 2:

2014-11-30T22:02:06,412 WARN  [00000012] :sas - Load Balancing interface call failed with exception 
<?xml version="1.0" ?><Exceptions><Exception><SASMessage severity="Error">The outcall request did not complete in 
the time allotted.</SASMessage></Exception></Exceptions>. 
2014-11-30T22:02:06,413 WARN  [00000012] :sas - <?xml version="1.0" ?><Exceptions></Exceptions> 
2014-11-30T22:02:06,413 INFO  [00000012] :sas - Disconnecting server SASMeta - Metadata Server Node 3 
from cluster SASMeta - Logical Metadata Server. 
2014-11-30T22:02:06,414 INFO  [00000015] :sas - Client connection 580705 for user sas closed. 
2014-11-30T22:02:06,611 WARN  [00000012] :sas - Lost contact with the server when calling an IOM interface. 
2014-11-30T22:02:06,633 INFO  [00000012] :sas - Setting the master node to SASMeta - Metadata Server Node 3. 
2014-11-30T22:02:06,634 INFO  [00901546] :sas - The cluster has lost quorum and is now OFFLINE. 

Node 3:

2014-11-30T22:03:06,151 WARN  [00000012] :sas - Load Balancing interface call failed with exception 
<?xml version="1.0" ?><Exceptions><Exception><SASMessage severity="Error">Invalid object specified to bridge 
protocol engine.</SASMessage></Exception></Exceptions>. 
2014-11-30T22:03:06,152 WARN  [00000012] :sas - Lost contact with the server when calling an IOM interface. 
2014-11-30T22:03:06,152 INFO  [00000012] :sas - Disconnecting server SASMeta - Metadata Server from 
cluster SASMeta - Logical Metadata Server. 
2014-11-30T22:03:06,153 INFO  [02088282] :sas - The cluster has lost quorum and is now OFFLINE. 

Workaround

If the cluster enters the state in which the quorum is lost, follow these steps to restore the quorum:

  1. If SAS® Management Console receives a message that the server is paused, then all servers are in a paused state. Stop all SAS servers using the sas.servers STOP script on all nodes.
  2. On the first node, run /Lev1/SASMeta/MetadataServer.sh start.
    Wait for a message that says the server is responding.
  3. Repeat Step 2 on the second node.
    Wait for a message that says the server is responding.
  4. Repeat Step 2 on the third node.
  5. On the third node, run /Lev1/SASMeta/MetadataServer.sh status.
    Confirm that Running is returned in the output.
  6. In SAS Management Console, select Metadata Manager ► Active Server ► Properties. In the Properties dialog box, click the Cluster tab. Determine whether there is a quorum and make sure that all nodes are running.
  7. Run the sas.servers START script on all nodes and resume normal operations.

Click the Hot Fix tab in this note to access the hot fix for this issue.



Operating System and Release Information

Product FamilyProductSystemProduct ReleaseSAS Release
ReportedFixed*ReportedFixed*
SAS SystemSAS Metadata ServerMicrosoft® Windows® for x649.49.4_M39.4 TS1M09.4 TS1M3
Microsoft Windows 8 Enterprise x649.49.4_M39.4 TS1M09.4 TS1M3
Microsoft Windows 8 Pro x649.49.4_M39.4 TS1M09.4 TS1M3
Microsoft Windows 8.1 Enterprise 32-bit9.49.4_M39.4 TS1M09.4 TS1M3
Microsoft Windows 8.1 Enterprise x649.49.4_M39.4 TS1M09.4 TS1M3
Microsoft Windows 8.1 Pro9.49.4_M39.4 TS1M09.4 TS1M3
Microsoft Windows 8.1 Pro 32-bit9.49.4_M39.4 TS1M09.4 TS1M3
Microsoft Windows Server 2008 R29.49.4_M39.4 TS1M09.4 TS1M3
Microsoft Windows Server 2008 for x649.49.4_M39.4 TS1M09.4 TS1M3
Microsoft Windows Server 2012 Datacenter9.49.4_M39.4 TS1M09.4 TS1M3
Microsoft Windows Server 2012 R2 Datacenter9.49.4_M39.4 TS1M09.4 TS1M3
Microsoft Windows Server 2012 R2 Std9.49.4_M39.4 TS1M09.4 TS1M3
Microsoft Windows Server 2012 Std9.49.4_M39.4 TS1M09.4 TS1M3
Windows 7 Enterprise x649.49.4_M39.4 TS1M09.4 TS1M3
Windows 7 Professional x649.49.4_M39.4 TS1M09.4 TS1M3
64-bit Enabled AIX9.49.4_M39.4 TS1M09.4 TS1M3
64-bit Enabled Solaris9.49.4_M39.4 TS1M09.4 TS1M3
HP-UX IPF9.49.4_M39.4 TS1M09.4 TS1M3
Linux for x649.49.4_M39.4 TS1M09.4 TS1M3
Solaris for x649.49.4_M39.4 TS1M09.4 TS1M3
* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.