STANDARD Procedure

Example 2: Standardizing BY Groups and Replacing Missing Values

Features:
PROC STANDARD statement options:
PRINT
REPLACE

BY statement

Other features:

FORMAT procedure

PRINT procedure

SORT procedure

Details

This example does the following:
  • calculates Z scores separately for each BY group using a mean of 0 and standard deviation of 1
  • replaces missing values with the given mean
  • prints the mean and standard deviation for the variables to standardize
  • prints the output data set

Program

options nodate pageno=1 linesize=80 pagesize=60;
proc format;
   value popfmt 1='Stable'
                2='Rapid';
run;
data lifexp;
   input PopulationRate Country $char14. Life50 Life93 @@;
   label life50='1950 life expectancy'
         life93='1993 life expectancy';
   datalines;
2 Bangladesh     .  53 2 Brazil         51 67
2 China          41 70 2 Egypt          42 60
2 Ethiopia       33 46 1 France         67 77
1 Germany        68 75 2 India          39 59
2 Indonesia      38 59 1 Japan          64 79
2 Mozambique      . 47 2 Philippines    48 64
1 Russia          . 65 2 Turkey         44 66
1 United Kingdom 69 76 1 United States  69 75
;
proc sort data=lifexp;
   by populationrate;
run;
proc standard data=lifexp mean=0 std=1 replace
              print out=zscore;
   by populationrate;
   format populationrate popfmt.;
   title1 'Life Expectancies by Birth Rate';
run;
proc print data=zscore noobs;
   title 'Standardized Life Expectancies at Birth';
   title2 'by a Country''s Birth Rate';
run;

Program Description

Set the SAS system options. The NODATE option specifies to omit the date and time when the SAS job began. The PAGENO= option specifies the page number for the next page of output that SAS produces. The LINESIZE= option specifies the line size. The PAGESIZE= option specifies the number of lines for a page of SAS output.
options nodate pageno=1 linesize=80 pagesize=60;
Assign a character string format to a numeric value. PROC FORMAT creates the format POPFMT to identify birth rates with a character value.
proc format;
   value popfmt 1='Stable'
                2='Rapid';
run;
Create the LIFEEXP data set. Each observation in this data set contains information about 1950 and 1993 life expectancies at birth for 16 nations. The birth rate for each nation is classified as stable (1) or rapid (2). The nations with missing data obtained independent status after 1950. Data are from Vital Signs 1994: The Trends That Are Shaping Our Future, Lester R. Brown, Hal Kane, and David Malin Roodman, eds. Copyright © 1994 by Worldwatch Institute. Reprinted by permission of W.W. Norton & Company, Inc.
data lifexp;
   input PopulationRate Country $char14. Life50 Life93 @@;
   label life50='1950 life expectancy'
         life93='1993 life expectancy';
   datalines;
2 Bangladesh     .  53 2 Brazil         51 67
2 China          41 70 2 Egypt          42 60
2 Ethiopia       33 46 1 France         67 77
1 Germany        68 75 2 India          39 59
2 Indonesia      38 59 1 Japan          64 79
2 Mozambique      . 47 2 Philippines    48 64
1 Russia          . 65 2 Turkey         44 66
1 United Kingdom 69 76 1 United States  69 75
;
Sort the LIFEEXP data set. PROC SORT sorts the observations by the birth rate.
proc sort data=lifexp;
   by populationrate;
run;
Generate the standardized data for all numeric variables and create the output data set ZSCORE. PROC STANDARD standardizes all numeric variables to a mean of 1 and a standard deviation of 0. REPLACE replaces missing values. PRINT prints statistics.
proc standard data=lifexp mean=0 std=1 replace
              print out=zscore;
Create the standardized values for each BY group. The BY statement standardizes the values separately by birth rate.
   by populationrate;
Assign a format to a variable and specify a title for the report. The FORMAT statement assigns a format to PopulationRate. The output data set contains formatted values. The TITLE statement specifies a title.
   format populationrate popfmt.;
   title1 'Life Expectancies by Birth Rate';
run;
Print the data set. PROC PRINT prints the ZSCORE data set with the standardized values. The TITLE statements specify two titles to print.
proc print data=zscore noobs;
   title 'Standardized Life Expectancies at Birth';
   title2 'by a Country''s Birth Rate';
run;
PROC STANDARD prints the variable name, mean, standard deviation, input frequency, and label of each variable to standardize for each BY group. Life expectancies for Bangladesh, Mozambique, and Russia are no longer missing. The missing values are replaced with the given mean (0).
                        Life Expectancies by Birth Rate                        1

---------------------------- PopulationRate=Stable -----------------------------

                               Standard
 Name              Mean       Deviation               N    Label

 Life50       67.400000        1.854724               5    1950 life expectancy
 Life93       74.500000        4.888763               6    1993 life expectancy


----------------------------- PopulationRate=Rapid -----------------------------

                               Standard
 Name              Mean       Deviation               N    Label

 Life50       42.000000        5.033223               8    1950 life expectancy
 Life93       59.100000        8.225300              10    1993 life expectancy
                    Standardized Life Expectancies at Birth                    2
                           by a Country's Birth Rate

              Population
                 Rate       Country            Life50      Life93

                Stable      France            -0.21567     0.51138
                Stable      Germany            0.32350     0.10228
                Stable      Japan             -1.83316     0.92048
                Stable      Russia             0.00000    -1.94323
                Stable      United Kingdom     0.86266     0.30683
                Stable      United States      0.86266     0.10228
                Rapid       Bangladesh         0.00000    -0.74161
                Rapid       Brazil             1.78812     0.96045
                Rapid       China             -0.19868     1.32518
                Rapid       Egypt              0.00000     0.10942
                Rapid       Ethiopia          -1.78812    -1.59265
                Rapid       India             -0.59604    -0.01216
                Rapid       Indonesia         -0.79472    -0.01216
                Rapid       Mozambique         0.00000    -1.47107
                Rapid       Philippines        1.19208     0.59572
                Rapid       Turkey             0.39736     0.83888