SUPPORT / SAMPLES & SAS NOTES
 

Support

Problem Note 56751: SAS® 9.4 Metadata Server cluster might become unresponsive after a user bulkload program is completed

DetailsHotfixAboutRate It

When using a SAS 9.4 Metadata Server cluster, you might notice that the cluster stops responding after executing a user bulkload program that uses User Import Macros such as %MDUEXTR and %MDUCHGLB.

The user bulkload program executes on the metadata server master node because it requires updates to the metadata. Update requests are redirected to the master node when a metadata server cluster is used. The program seems to complete. However, errors are logged to the slave nodes.

Immediately after the program completes, new metadata server login requests from client applications such as SAS® Management Console or SAS® Enterprise Guide® might stop responding or time out.

At approximately the same time that the program completes, the metadata server log for one or more of the slave nodes reports messages similar to the following.

SASMeta_MetadataServer_2015-08-04_metahost02_2974.log:

2015-08-04T11:38:58,664 INFO  [00000004] :sas - Update from master node flushes ID registry list for container
person, length=1
2015-08-04T11:38:58,678 INFO  [00000004] :sas - Update from master node flushes ID registry list for container
extrnldn, length=1
2015-08-04T11:39:32,838 WARN  [00001986] 12:sasadm@saspw - Thread wait timed out.
2015-08-04T11:39:32,838 ERROR [00001986] 12:sasadm@saspw - INTERNAL WARNING:  A non zero return code was received 
while waiting on a journal entry to be replayed.
2015-08-04T11:39:32,838 ERROR [00001986] 12:sasadm@saspw - INTERNAL ERROR:  It appears that no journal entry
was received within the 30 second timeout period. Proceeding anyway.
2015-08-04T11:39:32,844 INFO  [00001994] 12:sas - Client connection 12 for user sasadm@saspw closed.

After approximately two hours and 45 minutes have elapsed, the following "MVA task is still busy" message is reported to the same slave node log:

2015-08-04T14:25:46,847 WARN  [00000011] :sas - MVA task is still busy after 9999 wait periods.  Request denied.
2015-08-04T14:25:46,880 INFO  [00000011] :sas - MVA task is still busy after 9999 wait periods. Request denied.
2015-08-04T14:25:48,185 ERROR [00000011] :sas - AccessViolation (6) encountered in CONTEXT 00000011 using cookie 3.

The console log displays a traceback for the AccessViolation that reports a problem in the authorization cache area.

SASMeta_MetadataServer_console_metahost02.log:

ERROR: The system has encountered an unhandled Exception.
     : Please contact technical support and provide them
     : with the following traceback and system dump information:

ABORT:   SAS/TK is aborting
     :  Abort Log file [tk.reportlog.2974]
     :  ---------------------------------------
     :  Time [8/04/2015  2:25:48 PM]
     :  ---------------------------------------
     : Thread Name [tkoms cluster manager thread]

 Traceback:

/usr/local/SASHome/SASFoundation/9.4/sasexe/tkmk.so(bkt_abort_tkt_cb+0xa2) [0x7f1c2a9c6b22]
/usr/local/SASHome/SASFoundation/9.4/sasexe/tkmk.so(skaCallAbortRoutines+0x111) [0x7f1c2a9c5b81]
/usr/local/SASHome/SASFoundation/9.4/sasexe/tkmk.so(bkabort+0x84) [0x7f1c2a9d0c74]
/usr/local/SASHome/SASFoundation/9.4/sasexe/tkmk.so(bkt_signal_handler+0x144) [0x7f1c2a9c35d4]
/lib64/libpthread.so.0(+0xf500) [0x7f1c2bbd0500] 
/usr/local/SASHome/SASFoundation/9.4/sasexe/tkmk.so(skmMemRelease+0x2b) [0x7f1c2a9d44cb]
/usr/local/SASHome/SASFoundation/9.4/sasexe/tkoms.so(FOMSsecuritymanagerSynchronizeAuthorizationCacheChanges+0x689) 
[0x7f1bfeafbf99]
/usr/local/SASHome/SASFoundation/9.4/sasexe/tkoms.so(FOMSsecuritymanagerSynchronizeCacheChangeQueues+0xf6)
[0x7f1bfeafb8f6]
/usr/local/SASHome/SASFoundation/9.4/sasexe/tkoms.so(FOMSworkunitReloadJournalEntry+0x15d8) [0x7f1bfebec978]
/usr/local/SASHome/SASFoundation/9.4/sasexe/tkoms.so(+0x3ab1f5) [0x7f1bfec221f5]
/usr/local/SASHome/SASFoundation/9.4/sasexe/tkmk.so(sktMain+0x94) [0x7f1c2a9c1214]
/usr/local/SASHome/SASFoundation/9.4/sasexe/tkmk.so(bktMain+0x69) [0x7f1c2a9c2df9]
/lib64/libpthread.so.0(+0x7851) [0x7f1c2bbc8851]
/lib64/libc.so.6(clone+0x6d) [0x7f1c2b25790d]

When this problem occurs, it is necessary to restart the entire metadata server cluster in order to restore service. In most cases, this requires a full restart of the rest of the SAS deployment as well, including any compute tiers or middle tiers.

A workaround for the issue is to invoke a pause and resume of the metadata server just before invoking the user bulkload program. There is still a potential for the error to occur. However, the pause and resume greatly reduces the chances that the software will stop responding.

Click the Hot Fix tab in this note to access the hot fix for this issue.



Operating System and Release Information

Product FamilyProductSystemProduct ReleaseSAS Release
ReportedFixed*ReportedFixed*
SAS SystemSAS Metadata ServerLinux for x649.49.4_M49.4 TS1M09.4 TS1M4
* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.