Previous Page | Next Page

Fundamental Concepts for Using Base SAS Procedures

Procedure Concepts


Input Data Sets

Many Base SAS procedures require an input SAS data set. You specify the input SAS data set by using the DATA= option in the procedure statement, as in this example:

proc print data=emp;

If you omit the DATA= option, the procedure uses the value of the SAS system option _LAST_=. The default of _LAST_= is the most recently created SAS data set in the current SAS job or session. _LAST_= is described in detail in the SAS Language Reference: Dictionary.


RUN-Group Processing

RUN-group processing enables you to submit a PROC step with a RUN statement without ending the procedure. You can continue to use the procedure without issuing another PROC statement. To end the procedure, use a RUN CANCEL or a QUIT statement. Several Base SAS procedures support RUN-group processing:

CATALOG DATASETS PLOT PMENU TRANTAB

See the section on the individual procedure for more information.

Note:   PROC SQL executes each query automatically. Neither the RUN nor RUN CANCEL statement has any effect.  [cautionend]


Creating Titles That Contain BY-Group Information


BY-Group Processing

BY-group processing uses a BY statement to process observations that are ordered, grouped, or indexed according to the values of one or more variables. By default, when you use BY-group processing in a procedure step, a BY line identifies each group. This section explains how to create titles that serve as customized BY lines.


Suppressing the Default BY Line

When you insert BY-group processing information into a title, you usually want to suppress the default BY line. To suppress it, use the SAS system option NOBYLINE.

Note:   You must use the NOBYLINE option if you insert BY-group information into titles for the following Base SAS procedures:

MEANS STANDARD SUMMARY
PRINT
If you use the BY statement with the NOBYLINE option, then these procedures always start a new page for each BY group. This behavior prevents multiple BY groups from appearing on a single page and ensures that the information in the titles matches the report on the pages.  [cautionend]

Inserting BY-Group Information into a Title

The general form for inserting BY-group information into a title is as follows:

#BY-specification<.suffix>
BY-specification

is one of the following specifications:

BYVALn | BYVAL(BY-variable)

places the value of the specified BY variable in the title. You specify the BY variable with one of the following options:

n

is the nth BY variable in the BY statement.

BY-variable

is the name of the BY variable whose value you want to insert in the title.

BYVARn | BYVAR(BY-variable)

places the label or the name (if no label exists) of the specified BY variable in the title. You designate the BY variable with one of the following options:

n

is the nth BY variable in the BY statement.

BY-variable

is the name of the BY variable whose name you want to insert in the title.

BYLINE

inserts the complete default BY line into the title.

suffix

supplies text to place immediately after the BY-group information that you insert in the title. No space appears between the BY-group information and the suffix.


Example: Inserting a Value from Each BY Variable into the Title

This example demonstates these actions:

  1. creates a data set, GROC, that contains data for stores from four regions. Each store has four departments. See GROC for the DATA step that creates the data set.

  2. sorts the data by Region and Department.

  3. uses the SAS system option NOBYLINE to suppress the BY line that normally appears in output that is produced with BY-group processing.

  4. uses PROC CHART to chart sales by Region and Department. In the first TITLE statement, #BYVAL2 inserts the value of the second BY variable, Department, into the title. In the second TITLE statement, #BYVAL(Region) inserts the value of Region into the title. The first period after Region indicates that a suffix follows. The second period is the suffix.

  5. uses the SAS system option BYLINE to return to the creation of the default BY line with BY-group processing.

data groc; 1 
   input Region $9. Manager $ Department $ Sales;
   datalines;
Southeast    Hayes       Paper       250
Southeast    Hayes       Produce     100
Southeast    Hayes       Canned      120
Southeast    Hayes       Meat         80
...more lines of data...
Northeast    Fuller      Paper       200
Northeast    Fuller      Produce     300
Northeast    Fuller      Canned      420
Northeast    Fuller      Meat        125
;

proc sort data=groc;  2  
   by region department;
run;
options nobyline nodate pageno=1 
        linesize=64 pagesize=20;  3 
proc chart data=groc;  4  
   by region department;
   vbar manager / type=sum sumvar=sales;
   title1 'This chart shows #byval2 sales';
   title2 'in the #byval(region)..';
run;
options byline;   5 

This partial output shows two BY groups with customized BY lines:

                 This chart shows Canned sales                 1
                       in the Northwest.

        Sales Sum

        400 +       *****                   *****
            |       *****                   *****
        300 +       *****                   *****
            |       *****       *****       *****
        200 +       *****       *****       *****
            |       *****       *****       *****
        100 +       *****       *****       *****
            |       *****       *****       *****
            --------------------------------------------
                   Aikmann     Duncan     Jeffreys

                               Manager
                  This chart shows Meat sales                  2
                       in the Northwest.

        Sales Sum

        75 +       *****       *****
           |       *****       *****
        60 +       *****       *****
           |       *****       *****
        45 +       *****       *****
           |       *****       *****
        30 +       *****       *****       *****
           |       *****       *****       *****
        15 +       *****       *****       *****
           |       *****       *****       *****
           --------------------------------------------
                  Aikmann     Duncan     Jeffreys

                              Manager

Example: Inserting the Name of a BY Variable into a Title

This example inserts the name of a BY variable and the value of a BY variable into the title. The program does these actions.

  1. uses the SAS system option NOBYLINE to suppress the BY line that normally appears in output that is produced with BY-group processing.

  2. uses PROC CHART to chart sales by Region. In the first TITLE statement, #BYVAR(Region) inserts the name of the variable Region into the title. (If Region had a label, #BYVAR would use the label instead of the name.) The suffix al is appended to the label. In the second TITLE statement, #BYVAL1 inserts the value of the first BY variable, Region, into the title.

  3. uses the SAS system option BYLINE to return to the creation of the default BY line with BY-group processing.

options nobyline nodate pageno=1 
        linesize=64 pagesize=20;  1 
proc chart data=groc;   2  
   by region;
   vbar manager / type=mean sumvar=sales;
   title1 '#byvar(region).al Analysis';
   title2 'for the #byval1';
run;
options byline;  3 

This partial output shows one BY group with a customized BY line:

                       Regional Analysis                       1
                       for the Northwest

        Sales Mean

        300 +                               *****
            |                               *****
        200 +       *****                   *****
        100 +       *****       *****       *****
            |       *****       *****       *****
            --------------------------------------------
                   Aikmann     Duncan     Jeffreys

                               Manager

Example: Inserting the Complete BY Line into a Title

This example inserts the complete BY line into the title. The program does these actions:

  1. uses the SAS system option NOBYLINE to suppress the BY line that normally appears in output that is produced with BY-group processing.

  2. uses PROC CHART to chart sales by Region and Department. In the TITLE statement, #BYLINE inserts the complete BY line into the title.

  3. uses the SAS system option BYLINE to return to the creation of the default BY line with BY-group processing.

options nobyline nodate pageno=1 
        linesize=64 pagesize=20;  1 
proc chart data=groc;  2 
   by region department;
   vbar manager / type=sum sumvar=sales;
   title 'Information for #byline';
run;
options byline;  3 

This partial output shows two BY groups with customized BY lines:

       Information for Region=Northwest Department=Canned      1

        Sales Sum

        400 +       *****                   *****
            |       *****                   *****
        300 +       *****                   *****
            |       *****       *****       *****
        200 +       *****       *****       *****
            |       *****       *****       *****
        100 +       *****       *****       *****
            |       *****       *****       *****
            --------------------------------------------
                   Aikmann     Duncan     Jeffreys

                               Manager
        Information for Region=Northwest Department=Meat       2

        Sales Sum

        75 +       *****       *****
           |       *****       *****
        60 +       *****       *****
           |       *****       *****
        45 +       *****       *****
           |       *****       *****
        30 +       *****       *****       *****
           |       *****       *****       *****
        15 +       *****       *****       *****
           |       *****       *****       *****
           --------------------------------------------
                  Aikmann     Duncan     Jeffreys

                              Manager

Error Processing of BY-Group Specifications

SAS does not issue error or warning messages for incorrect #BYVAL, #BYVAR, or #BYLINE specifications. Instead, the text of the item becomes part of the title.


Shortcuts for Specifying Lists of Variable Names

Several statements in procedures allow multiple variable names. You can use these shortcut notations instead of specifying each variable name:

Notation Meaning
x1-xn
specifies variables X1 through Xn. The numbers must be consecutive.
x: specifies all variables that begin with the letter X.
x--a specifies all variables between X and A, inclusive. This notation uses the position of the variables in the data set.
x-numeric-a
specifies all numeric variables between X and A, inclusive. This notation uses the position of the variables in the data set.
x-character-a
specifies all character variables between X and A, inclusive. This notation uses the position of the variables in the data set.
_numeric_
specifies all numeric variables.
_character_ specifies all character variables.
_all_ specifies all variables.

Note:   You cannot use shortcuts to list variable names in the INDEX CREATE statement in PROC DATASETS.  [cautionend]

See the SAS Language Reference: Concepts for complete documentation.


Formatted Values


Using Formatted Values

Typically, when you print or group variable values, Base SAS procedures use the formatted values. This section contains examples of how Base SAS procedures use formatted values.


Example: Printing the Formatted Values for a Data Set

The following example prints the formatted values of the data set PROCLIB.PAYROLL. (See PROCLIB.PAYROLL for details about the DATA step that creates this data set.) In PROCLIB.PAYROLL, the variable Jobcode indicates the job and level of the employee. For example, TA1 indicates that the employee is at the beginning level for a ticket agent.

libname proclib 'SAS-library';

options nodate pageno=1 
        linesize=64 pagesize=40;
proc print data=proclib.payroll(obs=10) 
           noobs;
   title  'PROCLIB.PAYROLL';
   title2 'First 10 Observations Only';
run;

The following example is a partial printing of PROCLIB.PAYROLL:

                        PROCLIB.PAYROLL                        1
                   First 10 Observations Only

      Id
    Number   Gender  Jobcode    Salary      Birth      Hired

     1919      M       TA2       34376    12SEP60    04JUN87
     1653      F       ME2       35108    15OCT64    09AUG90
     1400      M       ME1       29769    05NOV67    16OCT90
     1350      F       FA3       32886    31AUG65    29JUL90
     1401      M       TA3       38822    13DEC50    17NOV85
     1499      M       ME3       43025    26APR54    07JUN80
     1101      M       SCP       18723    06JUN62    01OCT90
     1333      M       PT2       88606    30MAR61    10FEB81
     1402      M       TA2       32615    17JAN63    02DEC90
     1479      F       TA3       38785    22DEC68    05OCT89

The following PROC FORMAT step creates the format $JOBFMT., which assigns descriptive names for each job:

proc format;
    value $jobfmt 
          'FA1'='Flight Attendant Trainee'
          'FA2'='Junior Flight Attendant'
          'FA3'='Senior Flight Attendant'
          'ME1'='Mechanic Trainee'
          'ME2'='Junior Mechanic'
          'ME3'='Senior Mechanic'
          'PT1'='Pilot Trainee'
          'PT2'='Junior Pilot'
          'PT3'='Senior Pilot'
          'TA1'='Ticket Agent Trainee'
          'TA2'='Junior Ticket Agent'
          'TA3'='Senior Ticket Agent'
          'NA1'='Junior Navigator'
          'NA2'='Senior Navigator'
          'BCK'='Baggage Checker'
          'SCP'='Skycap';
run;

The FORMAT statement in this PROC MEANS step temporarily associates the $JOBFMT. format with the variable Jobcode:

options nodate pageno=1 
        linesize=64 pagesize=60;
proc means data=proclib.payroll mean max;
   class jobcode;
   var salary;
   format jobcode $jobfmt.;
   title 'Summary Statistics for';
   title2 'Each Job Code';
run;

PROC MEANS produces this output, which uses the $JOBFMT. format:

                     Summary Statistics for                    1
                         Each Job Code

                      The MEANS Procedure

                  Analysis Variable : Salary 
 
                              N
Jobcode                     Obs            Mean         Maximum
---------------------------------------------------------------
Baggage Checker               9        25794.22        26896.00

Flight Attendant Trainee     11        23039.36        23979.00

Junior Flight Attendant      16        27986.88        28978.00

Senior Flight Attendant       7        32933.86        33419.00

Mechanic Trainee              8        28500.25        29769.00

Junior Mechanic              14        35576.86        36925.00

Senior Mechanic               7        42410.71        43900.00

Junior Navigator              5        42032.20        43433.00

Senior Navigator              3        52383.00        53798.00

Pilot Trainee                 8        67908.00        71349.00

Junior Pilot                 10        87925.20        91908.00

Senior Pilot                  2        10504.50        11379.00

Skycap                        7        18308.86        18833.00

Ticket Agent Trainee          9        27721.33        28880.00

Junior Ticket Agent          20        33574.95        34803.00

Senior Ticket Agent          12        39679.58        40899.00
---------------------------------------------------------------

Note:   Because formats are character strings, formats for numeric variables are ignored when the values of the numeric variables are needed for mathematical calculations.  [cautionend]


Example: Grouping or Classifying Formatted Data

If you use a formatted variable to group or classify data, then the procedure uses the formatted values. The following example creates and assigns a format, $CODEFMT., that groups the levels of each job code into one category. PROC MEANS calculates statistics based on the groupings of the $CODEFMT. format.

proc format;
    value $codefmt
          'FA1','FA2','FA3'='Flight Attendant'
          'ME1','ME2','ME3'='Mechanic'
          'PT1','PT2','PT3'='Pilot'
          'TA1','TA2','TA3'='Ticket Agent'
                'NA1','NA2'='Navigator'
                      'BCK'='Baggage Checker'
                      'SCP'='Skycap';
run;

options nodate pageno=1 
        linesize=64 pagesize=40;
proc means data=proclib.payroll mean max;
   class jobcode;
   var salary;
   format jobcode $codefmt.;
   title 'Summary Statistics for Job Codes';
   title2 '(Using a Format that Groups the Job Codes)';
run;

PROC MEANS produces this output:

                Summary Statistics for Job Codes               1
           (Using a Format that Groups the Job Codes)

                      The MEANS Procedure

                  Analysis Variable : Salary 
 
                          N
    Jobcode             Obs            Mean         Maximum
    -------------------------------------------------------
    Baggage Checker       9        25794.22        26896.00

    Flight Attendant     34        27404.71        33419.00

    Mechanic             29        35274.24        43900.00

    Navigator             8        45913.75        53798.00

    Pilot                20        72176.25        91908.00

    Skycap                7        18308.86        18833.00

    Ticket Agent         41        34076.73        40899.00
    -------------------------------------------------------

Example: Temporarily Associating a Format with a Variable

If you want to associate a format with a variable temporarily, then you can use the FORMAT statement. For example, the following PROC PRINT step associates the DOLLAR8. format with the variable Salary for the duration of this PROC PRINT step only:

options nodate pageno=1 
        linesize=64 pagesize=40;
proc print data=proclib.payroll(obs=10) 
           noobs;
   format salary dollar8.;
   title 'Temporarily Associating a Format';
   title2 'with the Variable Salary';
run;

PROC PRINT produces this output:

                Temporarily Associating a Format               1
                    with the Variable Salary

     Id
   Number   Gender  Jobcode      Salary      Birth      Hired

    1919      M       TA2       $34,376    12SEP60    04JUN87
    1653      F       ME2       $35,108    15OCT64    09AUG90
    1400      M       ME1       $29,769    05NOV67    16OCT90
    1350      F       FA3       $32,886    31AUG65    29JUL90
    1401      M       TA3       $38,822    13DEC50    17NOV85
    1499      M       ME3       $43,025    26APR54    07JUN80
    1101      M       SCP       $18,723    06JUN62    01OCT90
    1333      M       PT2       $88,606    30MAR61    10FEB81
    1402      M       TA2       $32,615    17JAN63    02DEC90
    1479      F       TA3       $38,785    22DEC68    05OCT89

Example: Temporarily Dissociating a Format from a Variable

If a variable has a permanent format that you do not want a procedure to use, then temporarily dissociate the format from the variable by using a FORMAT statement.

In this example, the FORMAT statement in the DATA step permanently associates the $YRFMT. variable with the variable Year. Thus, when you use the variable in a PROC step, the procedure uses the formatted values. The PROC MEANS step, however, contains a FORMAT statement that dissociates the $YRFMT. format from Year for this PROC MEANS step only. PROC MEANS uses the stored value for Year in the output.

proc format;
   value $yrfmt  '1'='Freshman'
                 '2'='Sophomore'
                 '3'='Junior'
                 '4'='Senior';
run;
data debate;
    input Name $ Gender $  Year $  GPA  @@;
    format year $yrfmt.;
    datalines;
Capiccio m 1 3.598 Tucker   m 1 3.901
Bagwell  f 2 3.722 Berry    m 2 3.198
Metcalf  m 2 3.342 Gold     f 3 3.609
Gray     f 3 3.177 Syme     f 3 3.883
Baglione f 4 4.000 Carr     m 4 3.750
Hall     m 4 3.574 Lewis    m 4 3.421
;

options nodate pageno=1 
        linesize=64 pagesize=40;
proc means data=debate mean maxdec=2;
   class year;
   format year;
   title 'Average GPA';
run;

PROC MEANS produces this output, which does not use the YRFMT. format:

                          Average GPA                          1

                      The MEANS Procedure

                   Analysis Variable : GPA 
 
                              N
                Year        Obs            Mean
                -------------------------------
                1             2            3.75

                2             3            3.42

                3             3            3.56

                4             4            3.69
                -------------------------------

Formats and BY-Group Processing

When a procedure processes a data set, it checks to determine whether a format is assigned to the BY variable. If it is, then the procedure adds observations to the current BY groups until the formatted value changes. If nonconsecutive internal values of the BY variables have the same formatted value, then the values are grouped into different BY group. Thus, two BY groups are created with the same formatted value. Further, if different and consecutive internal values of the BY variables have the same formatted value, then they are included in the same BY group.


Formats and Error Checking

If SAS cannot find a format, then it stops processing and prints an error message in the SAS log. You can suppress this behavior with the SAS system option NOFMTERR. If you use NOFMTERR, and SAS cannot find the format, then SAS uses a default format and continues processing. Typically, for the default, SAS uses the BESTw. format for numeric variables and the $w. format for character variables.

Note:   To ensure that SAS can find user-written formats, use the SAS system option FMTSEARCH=. How to store formats is described in Storing Informats and Formats.  [cautionend]


Processing All the Data Sets in a Library

You can use the SAS Macro Facility to run the same procedure on every data set in a library. The macro facility is part of the Base SAS software.

Printing All the Data Sets in a SAS Library shows how to print all the data sets in a library. You can use the same macro definition to perform any procedure on all the data sets in a library. Simply replace the PROC PRINT piece of the program with the appropriate procedure code.


Operating Environment-Specific Procedures

Several Base SAS procedures are specific to one operating environment or one release. Operating Environment-Specific Procedures contains a table with additional information. These procedures are described in more detail in the SAS documentation for operating environments.


Statistic Descriptions

The following table identifies common descriptive statistics that are available in several Base SAS procedures. See Keywords and Formulas for more detailed information about available statistics and theoretical information.

Common Descriptive Statistics That Base SAS Procedures Calculate
Statistic Description Procedures
confidence intervals
FREQ, MEANS/SUMMARY, TABULATE, UNIVARIATE
CSS corrected sum of squares CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE
CV coefficient of variation MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE
goodness-of-fit tests
FREQ, UNIVARIATE
KURTOSIS kurtosis MEANS/SUMMARY, TABULATE, UNIVARIATE
MAX largest (maximum) value CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE
MEAN mean CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE
MEDIAN median (50th percentile) CORR (for nonparametric correlation measures), MEANS/SUMMARY, TABULATE, UNIVARIATE
MIN smallest (minimum) value CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE
MODE most frequent value (if not unique, the smallest mode is used) UNIVARIATE
N number of observations on which calculations are based CORR, FREQ, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE
NMISS number of missing values FREQ, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE
NOBS number of observations MEANS/SUMMARY, UNIVARIATE
PCTN the percentage of a cell or row frequency to a total frequency REPORT, TABULATE
PCTSUM the percentage of a cell or row sum to a total sum REPORT, TABULATE
Pearson correlation
CORR
percentiles
FREQ, MEANS/SUMMARY, REPORT, TABULATE, UNIVARIATE
RANGE range CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE
robust statistics trimmed means, Winsorized means UNIVARIATE
SKEWNESS skewness MEANS/SUMMARY, TABULATE, UNIVARIATE
Spearman correlation
CORR
STD standard deviation CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE
STDERR the standard error of the mean MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE
SUM sum CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE
SUMWGT sum of weights CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE
tests of location
UNIVARIATE
USS uncorrected sum of squares CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE
VAR variance CORR, MEANS/SUMMARY, REPORT, SQL, TABULATE, UNIVARIATE


Computational Requirements for Statistics

The following computational requirements are for the statistics that are listed in Common Descriptive Statistics That Base SAS Procedures Calculate. They do not describe recommended sample sizes.

Statistics are reported as missing if they cannot be computed.

Previous Page | Next Page | Top of Page