Fundamental Concepts for Using Base SAS Procedures |
Input Data Sets |
Many Base SAS procedures require an input SAS data set. You specify the input SAS data set by using the DATA= option in the procedure statement, as in this example:
proc print data=emp;
If you omit the DATA= option, the procedure uses the value of the SAS system option _LAST_=. The default of _LAST_= is the most recently created SAS data set in the current SAS job or session. _LAST_= is described in detail in the SAS Language Reference: Dictionary.
RUN-Group Processing |
RUN-group processing enables you to submit a PROC step with a RUN statement without ending the procedure. You can continue to use the procedure without issuing another PROC statement. To end the procedure, use a RUN CANCEL or a QUIT statement. Several Base SAS procedures support RUN-group processing:
CATALOG | DATASETS | PLOT | PMENU | TRANTAB |
See the section on the individual procedure for more information.
Note: PROC SQL executes each query automatically. Neither the RUN nor RUN CANCEL statement has any effect.
Creating Titles That Contain BY-Group Information |
BY-group processing uses a BY statement to process observations that are ordered, grouped, or indexed according to the values of one or more variables. By default, when you use BY-group processing in a procedure step, a BY line identifies each group. This section explains how to create titles that serve as customized BY lines.
When you insert BY-group processing information into a title, you usually want to suppress the default BY line. To suppress it, use the SAS system option NOBYLINE.
Note: You must use the NOBYLINE option if you insert BY-group information into titles for the following Base SAS procedures:
MEANS | STANDARD | SUMMARY |
The general form for inserting BY-group information into a title is as follows:
#BY-specification<.suffix> |
is one of the following specifications:
places the value of the specified BY variable in the title. You specify the BY variable with one of the following options:
is the nth BY variable in the BY statement.
is the name of the BY variable whose value you want to insert in the title.
places the label or the name (if no label exists) of the specified BY variable in the title. You designate the BY variable with one of the following options:
is the nth BY variable in the BY statement.
is the name of the BY variable whose name you want to insert in the title.
inserts the complete default BY line into the title.
supplies text to place immediately after the BY-group information that you insert in the title. No space appears between the BY-group information and the suffix.
This example demonstates these actions:
creates a data set, GROC, that contains data for stores from four regions. Each store has four departments. See GROC for the DATA step that creates the data set.
sorts the data by Region and Department.
uses the SAS system option NOBYLINE to suppress the BY line that normally appears in output that is produced with BY-group processing.
uses PROC CHART to chart sales by Region and Department. In the first TITLE statement, #BYVAL2 inserts the value of the second BY variable, Department, into the title. In the second TITLE statement, #BYVAL(Region) inserts the value of Region into the title. The first period after Region indicates that a suffix follows. The second period is the suffix.
uses the SAS system option BYLINE to return to the creation of the default BY line with BY-group processing.
data groc; 1 input Region $9. Manager $ Department $ Sales; datalines; Southeast Hayes Paper 250 Southeast Hayes Produce 100 Southeast Hayes Canned 120 Southeast Hayes Meat 80 ...more lines of data... Northeast Fuller Paper 200 Northeast Fuller Produce 300 Northeast Fuller Canned 420 Northeast Fuller Meat 125 ;
proc sort data=groc; 2 by region department; run; options nobyline nodate pageno=1 linesize=64 pagesize=20; 3 proc chart data=groc; 4 by region department; vbar manager / type=sum sumvar=sales; title1 'This chart shows #byval2 sales'; title2 'in the #byval(region)..'; run; options byline; 5
This partial output shows two BY groups with customized BY lines:
This chart shows Canned sales 1 in the Northwest. Sales Sum 400 + ***** ***** | ***** ***** 300 + ***** ***** | ***** ***** ***** 200 + ***** ***** ***** | ***** ***** ***** 100 + ***** ***** ***** | ***** ***** ***** -------------------------------------------- Aikmann Duncan Jeffreys Manager
This chart shows Meat sales 2 in the Northwest. Sales Sum 75 + ***** ***** | ***** ***** 60 + ***** ***** | ***** ***** 45 + ***** ***** | ***** ***** 30 + ***** ***** ***** | ***** ***** ***** 15 + ***** ***** ***** | ***** ***** ***** -------------------------------------------- Aikmann Duncan Jeffreys Manager
This example inserts the name of a BY variable and the value of a BY variable into the title. The program does these actions.
uses the SAS system option NOBYLINE to suppress the BY line that normally appears in output that is produced with BY-group processing.
uses PROC CHART to chart sales by Region. In the first TITLE statement, #BYVAR(Region) inserts the name of the variable Region into the title. (If Region had a label, #BYVAR would use the label instead of the name.) The suffix al is appended to the label. In the second TITLE statement, #BYVAL1 inserts the value of the first BY variable, Region, into the title.
uses the SAS system option BYLINE to return to the creation of the default BY line with BY-group processing.
options nobyline nodate pageno=1 linesize=64 pagesize=20; 1 proc chart data=groc; 2 by region; vbar manager / type=mean sumvar=sales; title1 '#byvar(region).al Analysis'; title2 'for the #byval1'; run; options byline; 3
This partial output shows one BY group with a customized BY line:
Regional Analysis 1 for the Northwest Sales Mean 300 + ***** | ***** 200 + ***** ***** 100 + ***** ***** ***** | ***** ***** ***** -------------------------------------------- Aikmann Duncan Jeffreys Manager
This example inserts the complete BY line into the title. The program does these actions:
uses the SAS system option NOBYLINE to suppress the BY line that normally appears in output that is produced with BY-group processing.
uses PROC CHART to chart sales by Region and Department. In the TITLE statement, #BYLINE inserts the complete BY line into the title.
uses the SAS system option BYLINE to return to the creation of the default BY line with BY-group processing.
options nobyline nodate pageno=1 linesize=64 pagesize=20; 1 proc chart data=groc; 2 by region department; vbar manager / type=sum sumvar=sales; title 'Information for #byline'; run; options byline; 3
This partial output shows two BY groups with customized BY lines:
Information for Region=Northwest Department=Canned 1 Sales Sum 400 + ***** ***** | ***** ***** 300 + ***** ***** | ***** ***** ***** 200 + ***** ***** ***** | ***** ***** ***** 100 + ***** ***** ***** | ***** ***** ***** -------------------------------------------- Aikmann Duncan Jeffreys Manager
Information for Region=Northwest Department=Meat 2 Sales Sum 75 + ***** ***** | ***** ***** 60 + ***** ***** | ***** ***** 45 + ***** ***** | ***** ***** 30 + ***** ***** ***** | ***** ***** ***** 15 + ***** ***** ***** | ***** ***** ***** -------------------------------------------- Aikmann Duncan Jeffreys Manager
SAS does not issue error or warning messages for incorrect #BYVAL, #BYVAR, or #BYLINE specifications. Instead, the text of the item becomes part of the title.
Shortcuts for Specifying Lists of Variable Names |
Several statements in procedures allow multiple variable names. You can use these shortcut notations instead of specifying each variable name:
Note: You cannot use shortcuts to list variable names in the INDEX CREATE statement in PROC DATASETS.
See the SAS Language Reference: Concepts for complete documentation.
Formatted Values |
Typically, when you print or group variable values, Base SAS procedures use the formatted values. This section contains examples of how Base SAS procedures use formatted values.
The following example prints the formatted values of the data set PROCLIB.PAYROLL. (See PROCLIB.PAYROLL for details about the DATA step that creates this data set.) In PROCLIB.PAYROLL, the variable Jobcode indicates the job and level of the employee. For example, TA1 indicates that the employee is at the beginning level for a ticket agent.
libname proclib 'SAS-library';
options nodate pageno=1 linesize=64 pagesize=40; proc print data=proclib.payroll(obs=10) noobs; title 'PROCLIB.PAYROLL'; title2 'First 10 Observations Only'; run;
The following example is a partial printing of PROCLIB.PAYROLL:
PROCLIB.PAYROLL 1 First 10 Observations Only Id Number Gender Jobcode Salary Birth Hired 1919 M TA2 34376 12SEP60 04JUN87 1653 F ME2 35108 15OCT64 09AUG90 1400 M ME1 29769 05NOV67 16OCT90 1350 F FA3 32886 31AUG65 29JUL90 1401 M TA3 38822 13DEC50 17NOV85 1499 M ME3 43025 26APR54 07JUN80 1101 M SCP 18723 06JUN62 01OCT90 1333 M PT2 88606 30MAR61 10FEB81 1402 M TA2 32615 17JAN63 02DEC90 1479 F TA3 38785 22DEC68 05OCT89
The following PROC FORMAT step creates the format $JOBFMT., which assigns descriptive names for each job:
proc format; value $jobfmt 'FA1'='Flight Attendant Trainee' 'FA2'='Junior Flight Attendant' 'FA3'='Senior Flight Attendant' 'ME1'='Mechanic Trainee' 'ME2'='Junior Mechanic' 'ME3'='Senior Mechanic' 'PT1'='Pilot Trainee' 'PT2'='Junior Pilot' 'PT3'='Senior Pilot' 'TA1'='Ticket Agent Trainee' 'TA2'='Junior Ticket Agent' 'TA3'='Senior Ticket Agent' 'NA1'='Junior Navigator' 'NA2'='Senior Navigator' 'BCK'='Baggage Checker' 'SCP'='Skycap'; run;
The FORMAT statement in this PROC MEANS step temporarily associates the $JOBFMT. format with the variable Jobcode:
options nodate pageno=1 linesize=64 pagesize=60; proc means data=proclib.payroll mean max; class jobcode; var salary; format jobcode $jobfmt.; title 'Summary Statistics for'; title2 'Each Job Code'; run;
PROC MEANS produces this output, which uses the $JOBFMT. format:
Summary Statistics for 1 Each Job Code The MEANS Procedure Analysis Variable : Salary N Jobcode Obs Mean Maximum --------------------------------------------------------------- Baggage Checker 9 25794.22 26896.00 Flight Attendant Trainee 11 23039.36 23979.00 Junior Flight Attendant 16 27986.88 28978.00 Senior Flight Attendant 7 32933.86 33419.00 Mechanic Trainee 8 28500.25 29769.00 Junior Mechanic 14 35576.86 36925.00 Senior Mechanic 7 42410.71 43900.00 Junior Navigator 5 42032.20 43433.00 Senior Navigator 3 52383.00 53798.00 Pilot Trainee 8 67908.00 71349.00 Junior Pilot 10 87925.20 91908.00 Senior Pilot 2 10504.50 11379.00 Skycap 7 18308.86 18833.00 Ticket Agent Trainee 9 27721.33 28880.00 Junior Ticket Agent 20 33574.95 34803.00 Senior Ticket Agent 12 39679.58 40899.00 ---------------------------------------------------------------
Note: Because formats are character strings, formats for numeric variables are ignored when the values of the numeric variables are needed for mathematical calculations.
If you use a formatted variable to group or classify data, then the procedure uses the formatted values. The following example creates and assigns a format, $CODEFMT., that groups the levels of each job code into one category. PROC MEANS calculates statistics based on the groupings of the $CODEFMT. format.
proc format; value $codefmt 'FA1','FA2','FA3'='Flight Attendant' 'ME1','ME2','ME3'='Mechanic' 'PT1','PT2','PT3'='Pilot' 'TA1','TA2','TA3'='Ticket Agent' 'NA1','NA2'='Navigator' 'BCK'='Baggage Checker' 'SCP'='Skycap'; run; options nodate pageno=1 linesize=64 pagesize=40; proc means data=proclib.payroll mean max; class jobcode; var salary; format jobcode $codefmt.; title 'Summary Statistics for Job Codes'; title2 '(Using a Format that Groups the Job Codes)'; run;
PROC MEANS produces this output:
Summary Statistics for Job Codes 1 (Using a Format that Groups the Job Codes) The MEANS Procedure Analysis Variable : Salary N Jobcode Obs Mean Maximum ------------------------------------------------------- Baggage Checker 9 25794.22 26896.00 Flight Attendant 34 27404.71 33419.00 Mechanic 29 35274.24 43900.00 Navigator 8 45913.75 53798.00 Pilot 20 72176.25 91908.00 Skycap 7 18308.86 18833.00 Ticket Agent 41 34076.73 40899.00 -------------------------------------------------------
If you want to associate a format with a variable temporarily, then you can use the FORMAT statement. For example, the following PROC PRINT step associates the DOLLAR8. format with the variable Salary for the duration of this PROC PRINT step only:
options nodate pageno=1 linesize=64 pagesize=40; proc print data=proclib.payroll(obs=10) noobs; format salary dollar8.; title 'Temporarily Associating a Format'; title2 'with the Variable Salary'; run;
PROC PRINT produces this output:
Temporarily Associating a Format 1 with the Variable Salary Id Number Gender Jobcode Salary Birth Hired 1919 M TA2 $34,376 12SEP60 04JUN87 1653 F ME2 $35,108 15OCT64 09AUG90 1400 M ME1 $29,769 05NOV67 16OCT90 1350 F FA3 $32,886 31AUG65 29JUL90 1401 M TA3 $38,822 13DEC50 17NOV85 1499 M ME3 $43,025 26APR54 07JUN80 1101 M SCP $18,723 06JUN62 01OCT90 1333 M PT2 $88,606 30MAR61 10FEB81 1402 M TA2 $32,615 17JAN63 02DEC90 1479 F TA3 $38,785 22DEC68 05OCT89
If a variable has a permanent format that you do not want a procedure to use, then temporarily dissociate the format from the variable by using a FORMAT statement.
In this example, the FORMAT statement in the DATA step permanently associates the $YRFMT. variable with the variable Year. Thus, when you use the variable in a PROC step, the procedure uses the formatted values. The PROC MEANS step, however, contains a FORMAT statement that dissociates the $YRFMT. format from Year for this PROC MEANS step only. PROC MEANS uses the stored value for Year in the output.
proc format; value $yrfmt '1'='Freshman' '2'='Sophomore' '3'='Junior' '4'='Senior'; run; data debate; input Name $ Gender $ Year $ GPA @@; format year $yrfmt.; datalines; Capiccio m 1 3.598 Tucker m 1 3.901 Bagwell f 2 3.722 Berry m 2 3.198 Metcalf m 2 3.342 Gold f 3 3.609 Gray f 3 3.177 Syme f 3 3.883 Baglione f 4 4.000 Carr m 4 3.750 Hall m 4 3.574 Lewis m 4 3.421 ; options nodate pageno=1 linesize=64 pagesize=40; proc means data=debate mean maxdec=2; class year; format year; title 'Average GPA'; run;
PROC MEANS produces this output, which does not use the YRFMT. format:
Average GPA 1 The MEANS Procedure Analysis Variable : GPA N Year Obs Mean ------------------------------- 1 2 3.75 2 3 3.42 3 3 3.56 4 4 3.69 -------------------------------
When a procedure processes a data set, it checks to determine whether a format is assigned to the BY variable. If it is, then the procedure adds observations to the current BY groups until the formatted value changes. If nonconsecutive internal values of the BY variables have the same formatted value, then the values are grouped into different BY group. Thus, two BY groups are created with the same formatted value. Further, if different and consecutive internal values of the BY variables have the same formatted value, then they are included in the same BY group.
If SAS cannot find a format, then it stops processing and prints an error message in the SAS log. You can suppress this behavior with the SAS system option NOFMTERR. If you use NOFMTERR, and SAS cannot find the format, then SAS uses a default format and continues processing. Typically, for the default, SAS uses the BESTw. format for numeric variables and the $w. format for character variables.
Note: To ensure that SAS can find user-written formats, use the SAS system option FMTSEARCH=. How to store formats is described in Storing Informats and Formats.
Processing All the Data Sets in a Library |
You can use the SAS Macro Facility to run the same procedure on every data set in a library. The macro facility is part of the Base SAS software.
Printing All the Data Sets in a SAS Library shows how to print all the data sets in a library. You can use the same macro definition to perform any procedure on all the data sets in a library. Simply replace the PROC PRINT piece of the program with the appropriate procedure code.
Operating Environment-Specific Procedures |
Several Base SAS procedures are specific to one operating environment or one release. Operating Environment-Specific Procedures contains a table with additional information. These procedures are described in more detail in the SAS documentation for operating environments.
Statistic Descriptions |
The following table identifies common descriptive statistics that are available in several Base SAS procedures. See Keywords and Formulas for more detailed information about available statistics and theoretical information.
Statistic | Description | Procedures | |
---|---|---|---|
confidence intervals |
|
FREQ, MEANS/SUMMARY, TABULATE, UNIVARIATE | |
CSS | corrected sum of squares | CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE | |
CV | coefficient of variation | MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE | |
goodness-of-fit tests |
|
FREQ, UNIVARIATE | |
KURTOSIS | kurtosis | MEANS/SUMMARY, TABULATE, UNIVARIATE | |
MAX | largest (maximum) value | CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE | |
MEAN | mean | CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE | |
MEDIAN | median (50th percentile) | CORR (for nonparametric correlation measures), MEANS/SUMMARY, TABULATE, UNIVARIATE | |
MIN | smallest (minimum) value | CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE | |
MODE | most frequent value (if not unique, the smallest mode is used) | UNIVARIATE | |
N | number of observations on which calculations are based | CORR, FREQ, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE | |
NMISS | number of missing values | FREQ, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE | |
NOBS | number of observations | MEANS/SUMMARY, UNIVARIATE | |
PCTN | the percentage of a cell or row frequency to a total frequency | REPORT, TABULATE | |
PCTSUM | the percentage of a cell or row sum to a total sum | REPORT, TABULATE | |
Pearson correlation |
|
CORR | |
percentiles |
|
FREQ, MEANS/SUMMARY, REPORT, TABULATE, UNIVARIATE | |
RANGE | range | CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE | |
robust statistics | trimmed means, Winsorized means | UNIVARIATE | |
SKEWNESS | skewness | MEANS/SUMMARY, TABULATE, UNIVARIATE | |
Spearman correlation |
|
CORR | |
STD | standard deviation | CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE | |
STDERR | the standard error of the mean | MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE | |
SUM | sum | CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE | |
SUMWGT | sum of weights | CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE | |
tests of location | UNIVARIATE | ||
USS | uncorrected sum of squares | CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE | |
VAR | variance | CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE |
Computational Requirements for Statistics |
The following computational requirements are for the statistics that are listed in Common Descriptive Statistics That Base SAS Procedures Calculate. They do not describe recommended sample sizes.
N and NMISS do not require any nonmissing observations.
SUM, MEAN, MAX, MIN, RANGE, USS, and CSS require at least one nonmissing observation.
VAR, STD, STDERR, and CV require at least two observations.
CV requires that MEAN is not equal to zero.
Statistics are reported as missing if they cannot be computed.
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.