SUPPORT / SAMPLES & SAS NOTES
 

Support

Problem Note 61702: Load-balanced object spawners or metadata server cluster nodes do not successfully reconnect after quorum loss

DetailsHotfixAboutRate It

You might encounter a problem in which load-balanced object spawners or metadata server cluster nodes do not successfully reconnect after quorum loss. See the section below that describes your issue.

Metadata Server Cluster

Metadata server cluster nodes do not successfully reconnect with each other to re-establish the quorum after quorum loss. This problem occurs with a SAS® 9.4 deployment that contains a metadata server cluster configured with three or more cluster nodes.  

The following sequence of events occurs to cause the issue:

  1. The metadata server cluster is in quorum and running normally. The master node is redirecting client requests to the two or more slave nodes in the cluster. 
  2. Then, a connectivity problem occurs that results in two or more nodes losing their connections to each other. This connection issue results in a loss of quorum. Here are some messages that might be logged:
    2017-09-13T06:37:42,520 WARN  [04350166] 168:sas - Load Balancing interface call failed with exception <?xml version="1.0"?><Exceptions>
    <Exception><SASMessage severity="Error">The outcall request did not complete in the time allotted.</SASMessage></Exception></Exceptions>.
    2017-09-13T06:37:42,520 WARN  [04350166] 168:sas - <?xml version="1.0" ?><Exceptions></Exceptions>
    2017-09-13T06:37:42,520 INFO  [04350166] 168:sas - Disconnecting server SASMeta - Metadata Server from cluster SASMeta - Logical Metadata Server.
    2017-09-13T06:37:42,520 WARN  [04350166] 168:sas - Lost contact with the server when calling an IOM interface.
    2017-09-13T06:37:42,520 INFO  [04350166] 168:sas - Setting the master node to SASMeta - Metadata Server.
    2017-09-13T06:37:42,520 INFO  [04350171] :sas - The cluster has lost quorum and is now OFFLINE.
  3. When the nodes attempt to reconnect to each other, the following messages might be logged in some cases:
    2017-09-13T06:37:50,191 ERROR [00000014] :sas - INTERNAL ERROR: Connecting node SASMeta - Metadata Server has no pIface.
    2017-09-13T06:37:50,191 WARN  [04350203] :sas - Internal error: The Cluster Manager node connection was not initialized.
    2017-09-13T06:37:50,191 WARN  [04350203] :sas - Internal error: The Cluster Manager node connection was not initialized.
    It is important to note that you can encounter this defect even if these ERROR and WARN messages are not logged. It is possible that the logs might show only connecting messages that appear normal. However, the defect is still occurring. 

In this scenario, the metadata server cluster is not able to repair itself. You must perform a full restart of the metadata server cluster to restore service. 

 

Load-Balanced Object Spawners

You might also encounter an issue in which the load-balanced object spawners lose connection and do not successfully reconnect with one another.

In this scenario, these are the messages that you would see in the object spawner log:

2019-05-20T00:08:59,782 WARN  [00000036] :sas - Load Balancing interface call failed with exception <?xml version="1.0" ?><Exceptions><Exception><SASMessage severity="Error">Invalid object specified to bridge protocol engine.</SASMessage></Exception></Exceptions>.
2019-05-20T00:08:59,782 WARN  [00000036] :sas - Lost contact with the server when calling an IOM interface.
2019-05-20T00:10:07,703 WARN  [00000036] :sas - The load balancing processor could not send update to peer (A50UU5AV.AY000003_@server.com)

Click the Hot Fix tab in this note to access the hot fix for this issue.

Note: This issue is fixed in Rev. 940_18w25.



Operating System and Release Information

Product FamilyProductSystemProduct ReleaseSAS Release
ReportedFixed*ReportedFixed*
SAS SystemSAS Metadata ServerMicrosoft Windows 8 Enterprise 32-bit9.4_M39.4_M69.4 TS1M39.4 TS1M6
Microsoft Windows 8 Enterprise x649.4_M39.4_M69.4 TS1M39.4 TS1M6
Microsoft Windows 109.4_M39.4_M69.4 TS1M39.4 TS1M6
Microsoft Windows Server 20089.4_M39.4_M69.4 TS1M39.4 TS1M6
Microsoft Windows Server 2008 R29.4_M39.4_M69.4 TS1M39.4 TS1M6
Microsoft Windows Server 2008 for x649.4_M39.4_M69.4 TS1M39.4 TS1M6
Microsoft Windows Server 2012 Datacenter9.4_M39.4_M69.4 TS1M39.4 TS1M6
Microsoft Windows Server 2012 R2 Datacenter9.4_M39.4_M69.4 TS1M39.4 TS1M6
Microsoft Windows Server 2012 R2 Std9.4_M39.4_M69.4 TS1M39.4 TS1M6
Microsoft Windows Server 2012 Std9.4_M39.4_M69.4 TS1M39.4 TS1M6
Windows 7 Enterprise 32 bit9.4_M39.4_M69.4 TS1M39.4 TS1M6
Windows 7 Enterprise x649.4_M39.4_M69.4 TS1M39.4 TS1M6
Windows 7 Home Premium 32 bit9.4_M39.4_M69.4 TS1M39.4 TS1M6
Windows 7 Home Premium x649.4_M39.4_M69.4 TS1M39.4 TS1M6
Windows 7 Professional 32 bit9.4_M39.4_M69.4 TS1M39.4 TS1M6
Windows 7 Professional x649.4_M39.4_M69.4 TS1M39.4 TS1M6
Windows 7 Ultimate 32 bit9.4_M39.4_M69.4 TS1M39.4 TS1M6
Windows 7 Ultimate x649.4_M39.4_M69.4 TS1M39.4 TS1M6
64-bit Enabled AIX9.4_M39.4_M69.4 TS1M39.4 TS1M6
64-bit Enabled Solaris9.4_M39.4_M69.4 TS1M39.4 TS1M6
HP-UX IPF9.4_M39.4_M69.4 TS1M39.4 TS1M6
Linux for x649.4_M39.4_M69.4 TS1M39.4 TS1M6
Solaris for x649.4_M39.4_M69.4 TS1M39.4 TS1M6
Microsoft Windows 8 Pro 32-bit9.4_M39.4_M69.4 TS1M39.4 TS1M6
Microsoft Windows 8 Pro x649.4_M39.4_M69.4 TS1M39.4 TS1M6
Microsoft Windows 8.1 Enterprise 32-bit9.4_M39.4_M69.4 TS1M39.4 TS1M6
Microsoft Windows 8.1 Enterprise x649.4_M39.4_M69.4 TS1M39.4 TS1M6
Microsoft Windows 8.1 Pro 32-bit9.4_M39.4_M69.4 TS1M39.4 TS1M6
Microsoft Windows 8.1 Pro x649.4_M39.4_M69.4 TS1M39.4 TS1M6
Microsoft® Windows® for x649.4_M39.4_M69.4 TS1M39.4 TS1M6
* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.