![]() | ![]() | ![]() |
Below is a sample of how to monitor file system storage UNIX/Solaris servers. It has two parts. The first component shows how to detect and report space problems. The second part shows how to prevent other production jobs from executing by 'tripping a circuit breaker'. In essence, this application:
/*Initialize the environment and redirect the logs and listings*/ filename _all_ clear; filename spcllog "/some/directory/spacechk_&sysdate..log"; filename spcllst "/some/directory/spacechk_&sysdate..lst"; proc printto log=spcllog print=spcllst new; run; ods html close; libname _all_ clear; options nomprint nomtrace mlogic symbolgen nofullstimer compress=yes reuse=yes ls=132 ps=80; proc datasets kill nolist; run; quit; /* Capture disk space information and store it in a text file */ /* Use the same text file as input to the application */ x 'rm /some/directory/spacechk.dat'; x 'df -k>/some/directory/spacechk.dat'; filename space '/some/directory/spacechk.dat'; data one (drop=stats); length used avail capacity $ 12 directory $ 30 space_status $ 90 filesys $ 40 pct_used $ 10; infile space pad missover lrecl=256; input @1 stats $char256.; filesys=scan(stats,1,' '); capacity=scan(stats,2,' '); used=scan(stats,3,' '); avail=scan(stats,4,' '); pct_used=scan(stats,5,' '); directory=scan(stats,6,' '); if _n_>1; run ; data space_chk; set one; pct_used=compress(trim(left(scan(pct_used,1,'%')))); directory=compress(trim(left(directory))); filesys=compress(trim(left(filesys))); cdflag=compress(trim(left(substr(directory,2,2)))); if input(pct_used,4.)>=0 and input(pct_used,4.)<=59 then do; space_status= 'NOTICE: Less than 60% has been used'; end; else if input(pct_used,4.)>=60 and input(pct_used,4.)<=69 then do; space_status= 'ATTENTION: More than 60% has been used'; end; else if input(pct_used,4.)>=70 and input(pct_used,4.)<=79 then do; space_status= 'WARNING: 70-79% has been used - I/O Performance Will Suffer'; end; else if input(pct_used,4.)>=80 and input(pct_used,4.)<=89 then do; space_status= 'DANGER: 80-89% has been used - File System is Approaching Failure'; end; else if input(pct_used,4.)>=90 then do; space_status= 'EMERGENCY: 90% or more has been used- FILE SYSTEM FAILURE WILL HAPPEN AT ANY TIME!'; end; else do; space_status="HUH?"; end; run; proc print data=space_chk; title "Space Evaluation for &sysdate on SERVER_NAME"; run; /*generate metadata from the dataset*/ /*to determine the number of records*/ %macro metadata(ds); %global dset nvars nobs; %let dset = &ds; %let dsid = %sysfunc(open(&dset)); %let nobs = %sysfunc(attrn(&dsid,NOBS)); %let nvars = %sysfunc(attrn(&dsid,NVARS)); %let rc = %sysfunc(close(&dsid)); %mend metadata; %metadata(space_chk); %macro space_eval(threshold=, /*set the percentage that triggers email */ cutoff=, /*percentage that trips kill switch */ recip=, /*primary recipient of message*/ cc1=, /*secondary recipient */ cc2=, /*tertiary recipient */ cc3=); /*quaternary recipient*/ %do n = 1 %to &nobs; /*Begin NOBS do*/ data _null_; set space_chk; if _N_ = &n; call symput('pct_used',pct_used); call symput('space_status',space_status); call symput('directory',directory); call symput('filesys',filesys); call symput('cdflag',cdflag); run; %if &cdflag^=cd %then %do; /*start cdrom do*/ /*IF YOU HAVE A CDROM ON YOUR SERVER, THEN THIS IS ALWAYS 100 PERCENT*/ /* Notify project staff if the available space is */ /* at or above the threshold of comfort */ %if ( &pct_used >= &threshold) %then %do; /*Begin THRESHOLD Do*/ filename outbox email "&recip" ; data _null_; file outbox cc=("&cc1","&cc2","&cc3") subject="&sysdate Space Evaluation for &directory on SERVER_NAME - &space_status"; put "On &sysday - &sysdate at &systime , directory &directory"; put "mounted on the &filesys filesystem"; put "had used &pct_used percent of the space."; put "A risk assessment of the directory has resulted in a status of ->"; put "&space_status"; put "Please evaluate this risk prior to processing your job"; put "If the risk is critical, then the SAS will set the KILL SWITCH."; put " "; put "The log and listing from this job are stored in:"; put "/some/directory/spacechk_&sysdate..log"; put "/some/directory/spacechk_&sysdate..lst"; put " "; run; %end; /*End THRESHOLD Do*/ |
This section below allows you to prevent other jobs from executing if the file system is full. Here is the logic: You create a SAS file named CONTROL_CENTER. This file has one variable called KILL_SWITCH. The default value for this variable is a character '0'. To use this KILL_SWITCH, you would need your production program to reference the SAS file at the start of execution and evaluate the KILL_SWITCH value. If the value is '0', then the production program should continue. If the value is '1', then you can provide instructions to issue an email and an ENDSAS statement to end the production run. You set the CUTOFF value in the macro parameter. If the percentage used exceeds the CUTOFF threshold, then the KILL_SWITCH value is modified from '0' to '1'. Just like any circuit breaker, you would have to manually reset the KILL_SWITCH value back to '0' after you have resolved the space problem.
Hence, the top portion of the program is what you need to monitor the space on the server; you would only need this bottom portion if you wanted to prevent other space-dependent jobs from executing. In addition, if the cell phones are not NEXTEL, then you must consult your vendor's manual to get the correct syntax for addressing text messages.
Finally, you will need to run this as multiple CRON jobs. You may want to schedule them every 4 hours, 6 hours, 8 hours what ever interval you think is best. I suggest that you schedule this for 7 days a week assuming that you will not have a problem during non-core days is very risky ;-)
BIG TICKET ITEM The percentages used in the macro parameters should be set to your needs. The values supplied below (95% for the email threshold and 97% for the CUTOFF threshold) are based on our needs; you should choose values that are best for you.
%if (&pct_used >= &cutoff ) %then %do; /*Begin GE Do*/ %let prod_flag=%substr(&directory,2,4); %if &prod_flag=%str(ABC) %then %do; /*WHATEVER THE FIRST 3 LETTERS ARE OF YOUR MAIN/PRODUCTION FILE SYSTEM*/ filename outbox email "&recip" ; data _null_; file outbox cc=("&cc1","&cc2","&cc3") subject="ATTENTION - &directory on CES3500 - &space_status - SAS WILL SET THE KILL SWITCH"; put "On &sysday - &sysdate at &systime , directory &directory"; put "mounted on the &filesys filesystem"; put "had used &pct_used percent of the space."; put "A risk assessment of the directory has resulted in a status of ->"; put "&space_status"; put " "; put "THIS PROGRAM WILL NOW SET THE SYSTEM KILL SWITCH"; put "AFTER WHICH, THE SCHEDULED SAS JOBS WILL NOT EXECUTE."; put "PLEASE CONTACT THE ADMINISTRATORS FOR ASSISTANCE."; put " "; put "The log and listing from this job are stored in:"; put "/some/directory/spacechk_&sysdate..log"; put "/some/directory/spacechk_&sysdate..lst"; run; libname killjob '/some/directory'; data killjob.control_center; modify killjob.control_center; if kill_switch in ('0',' ') then do; kill_switch='1'; end; run; %end; /*End of PROD_flag Do*/ %end; /*End of GE Do*/ %end; /*End of cdrom do*/ %end; /*End NOBS Do */ %mend space_eval; %space_eval(threshold=95, cutoff=97, recip=somebody@your.org, cc1= 0005551212@messaging.nextel.com, cc2= 1115551212@messaging.nextel.com, cc3= 2225551212@messaging.nextel.com); /***************************************************/ /*clean up everything and reset the default options*/ /***************************************************/ ods html close; libname _all_ clear; filename _all_ clear; options nomprint nomtrace nomlogic nosymbolgen nofullstimer; proc datasets kill nolist; run; quit; proc printto; run; |
![]()
About the Author
Bryan K. Beverly is a Software Architect and Team Leader with BAE Systems Information Technology. He has been using SAS for 20+ years and is currently supporting SAS-based systems at the Bureau of Labor Statistics. Bryan has served as a presenter and volunteer at SAS conferences for more than 10 years.
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
| Type: | Sample |
| Topic: | Non SAS Authors ==> Bryan K. Beverly |
| Date Modified: | 2005-04-01 03:02:01 |
| Date Created: | 2005-03-31 13:44:24 |
| Product Family | Product | Host | SAS Release | |
| Starting | Ending | |||
| SAS System | Base SAS | Solaris | n/a | n/a |


