When using a SAS 9.4 Metadata Server cluster, you might notice that the cluster stops responding after executing a user bulkload program that uses User Import Macros such as %MDUEXTR and %MDUCHGLB.
The user bulkload program executes on the metadata server master node because it requires updates to the metadata. Update requests are redirected to the master node when a metadata server cluster is used. The program seems to complete. However, errors are logged to the slave nodes.
Immediately after the program completes, new metadata server login requests from client applications such as SAS® Management Console or SAS® Enterprise Guide® might stop responding or time out.
At approximately the same time that the program completes, the metadata server log for one or more of the slave nodes reports messages similar to the following.
SASMeta_MetadataServer_2015-08-04_metahost02_2974.log:
2015-08-04T11:38:58,664 INFO [00000004] :sas - Update from master node flushes ID registry list for container person, length=1 2015-08-04T11:38:58,678 INFO [00000004] :sas - Update from master node flushes ID registry list for container extrnldn, length=1 2015-08-04T11:39:32,838 WARN [00001986] 12:sasadm@saspw - Thread wait timed out. 2015-08-04T11:39:32,838 ERROR [00001986] 12:sasadm@saspw - INTERNAL WARNING: A non zero return code was received while waiting on a journal entry to be replayed. 2015-08-04T11:39:32,838 ERROR [00001986] 12:sasadm@saspw - INTERNAL ERROR: It appears that no journal entry was received within the 30 second timeout period. Proceeding anyway. 2015-08-04T11:39:32,844 INFO [00001994] 12:sas - Client connection 12 for user sasadm@saspw closed.
After approximately two hours and 45 minutes have elapsed, the following "MVA task is still busy" message is reported to the same slave node log:
2015-08-04T14:25:46,847 WARN [00000011] :sas - MVA task is still busy after 9999 wait periods. Request denied. 2015-08-04T14:25:46,880 INFO [00000011] :sas - MVA task is still busy after 9999 wait periods. Request denied. 2015-08-04T14:25:48,185 ERROR [00000011] :sas - AccessViolation (6) encountered in CONTEXT 00000011 using cookie 3.
The console log displays a traceback for the AccessViolation that reports a problem in the authorization cache area.
SASMeta_MetadataServer_console_metahost02.log:
ERROR: The system has encountered an unhandled Exception. : Please contact technical support and provide them : with the following traceback and system dump information: ABORT: SAS/TK is aborting : Abort Log file [tk.reportlog.2974] : --------------------------------------- : Time [8/04/2015 2:25:48 PM] : --------------------------------------- : Thread Name [tkoms cluster manager thread] Traceback: /usr/local/SASHome/SASFoundation/9.4/sasexe/tkmk.so(bkt_abort_tkt_cb+0xa2) [0x7f1c2a9c6b22] /usr/local/SASHome/SASFoundation/9.4/sasexe/tkmk.so(skaCallAbortRoutines+0x111) [0x7f1c2a9c5b81] /usr/local/SASHome/SASFoundation/9.4/sasexe/tkmk.so(bkabort+0x84) [0x7f1c2a9d0c74] /usr/local/SASHome/SASFoundation/9.4/sasexe/tkmk.so(bkt_signal_handler+0x144) [0x7f1c2a9c35d4] /lib64/libpthread.so.0(+0xf500) [0x7f1c2bbd0500] /usr/local/SASHome/SASFoundation/9.4/sasexe/tkmk.so(skmMemRelease+0x2b) [0x7f1c2a9d44cb] /usr/local/SASHome/SASFoundation/9.4/sasexe/tkoms.so(FOMSsecuritymanagerSynchronizeAuthorizationCacheChanges+0x689) [0x7f1bfeafbf99] /usr/local/SASHome/SASFoundation/9.4/sasexe/tkoms.so(FOMSsecuritymanagerSynchronizeCacheChangeQueues+0xf6) [0x7f1bfeafb8f6] /usr/local/SASHome/SASFoundation/9.4/sasexe/tkoms.so(FOMSworkunitReloadJournalEntry+0x15d8) [0x7f1bfebec978] /usr/local/SASHome/SASFoundation/9.4/sasexe/tkoms.so(+0x3ab1f5) [0x7f1bfec221f5] /usr/local/SASHome/SASFoundation/9.4/sasexe/tkmk.so(sktMain+0x94) [0x7f1c2a9c1214] /usr/local/SASHome/SASFoundation/9.4/sasexe/tkmk.so(bktMain+0x69) [0x7f1c2a9c2df9] /lib64/libpthread.so.0(+0x7851) [0x7f1c2bbc8851] /lib64/libc.so.6(clone+0x6d) [0x7f1c2b25790d]
When this problem occurs, it is necessary to restart the entire metadata server cluster in order to restore service. In most cases, this requires a full restart of the rest of the SAS deployment as well, including any compute tiers or middle tiers.
A workaround for the issue is to invoke a pause and resume of the metadata server just before invoking the user bulkload program. There is still a potential for the error to occur. However, the pause and resume greatly reduces the chances that the software will stop responding.
Click the Hot Fix tab in this note to access the hot fix for this issue.
Product Family | Product | System | Product Release | SAS Release | ||
Reported | Fixed* | Reported | Fixed* | |||
SAS System | SAS Metadata Server | Linux for x64 | 9.4 | 9.4_M4 | 9.4 TS1M0 | 9.4 TS1M4 |
A fix for this issue for Base SAS 9.4_M3 is available at:
https://tshf.sas.com/techsup/download/hotfix/HF2/V01.html#56751A fix for this issue for Base SAS 9.4_M2 is available at:
https://tshf.sas.com/techsup/download/hotfix/HF2/R19.html#56751A fix for this issue for Base SAS 9.4_M1 is available at:
https://tshf.sas.com/techsup/download/hotfix/HF2/M88.html#56751Type: | Problem Note |
Priority: | medium |
Date Modified: | 2016-10-19 11:42:43 |
Date Created: | 2015-10-08 10:08:09 |