The STDIZE Procedure |
PROC STDIZE Statement |
The PROC STDIZE statement invokes the procedure. You can specify the following options in the PROC STDIZE statement. Table 81.1 summarizes the options.
Option |
Description |
---|---|
Specify standardization methods |
|
METHOD= |
specifies the name of the standardization method |
INITIAL= |
specifies the method for computing initial estimates for the A estimates |
Unstandardize variables |
|
UNSTD |
unstandardizes variables when you also specify the METHOD=IN option |
Process missing values |
|
NOMISS |
omits observations with any missing values from computation |
MISSING= |
specifies the method or a numeric value for replacing missing values |
REPLACE |
replaces missing data with zero in the standardized data |
REPONLY |
replaces missing data with the location measure (does not standardize the data) |
Specify data set details |
|
DATA= |
specifies the input data set |
KEEPLEN |
specifies that output variables inherit the length of the analysis variable |
OUT= |
specifies the output data set |
OUTSTAT= |
specifies the output statistic data set |
Specify computational settings |
|
VARDEF= |
specifies the variances divisor |
NMARKERS= |
specifies the number of markers when you also specify PCTLMTD=ONEPASS |
MULT= |
specifies the constant to multiply each value by after standardizing |
ADD= |
specifies the constant to add to each value after standardizing and multiplying by the value specified in the MULT= option |
FUZZ= |
specifies the relative fuzz factor for writing the output |
Specify percentiles |
|
PCTLDEF= |
specifies the definition of percentiles when you also specify the PCTLMTD=ORD_STAT option |
PCTLMTD= |
specifies the method used to estimate percentiles |
PCTLPTS= |
writes observations containing percentiles to the data set specified in the OUTSTAT= option |
Normalize scale estimators |
|
NORM |
normalizes the scale estimator to be consistent for the standard deviation of a normal distribution |
SNORM |
normalizes the scale estimator to have an expectation of approximately 1 for a standard normal distribution |
Specify output |
|
PSTAT |
displays the location and scale measures |
These options and their abbreviations are described (in alphabetical order) in the remainder of this section.
specifies a constant, c, to add to each value after standardizing and multiplying by the value you specify in the MULT= option. The default value is 0.
specifies the input data set to be standardized. If you omit the DATA= option, the most recently created data set is used.
specifies the relative fuzz factor. The default value is 1E–14. For the OUT= data set, the score is computed as follows:
where is the constant specified in the MULT= option, or 1 if MULT= option is not specified.
For the OUTSTAT= data set and the Location and Scale table, the scale and location values are computed as follows:
Otherwise,
specifies the method for computing initial estimates for the A estimates (ABW, AWAVE, and AHUBER). You cannot specify the following methods for initial estimates: INITIAL=ABW, INITIAL=AHUBER, INITIAL=AWAVE, and INITIAL=IN. The default is INITIAL=MAD.
specifies that output variables inherit the length of the analysis variable that PROC STDIZE uses to derive them. PROC STDIZE stores numbers in double-precision without this option.
Caution: The KEEPLEN option causes the output variables to permanently lose numeric precision by truncating or rounding the value. However, the precision of the output variables will match that of the input.
specifies the name of the method for computing location and scale measures. Valid values for name are as follows: MEAN, MEDIAN, SUM, EUCLEN, USTD, STD, RANGE, MIDRANGE, MAXABS, IQR, MAD, ABW, AHUBER, AWAVE, AGK, SPACING, L, and IN.
For details about these methods, see the descriptions in the section Standardization Methods. The default is METHOD=STD.
specifies the method (or a numeric value) for replacing missing values. If you omit the MISSING= option, the REPLACE option replaces missing values with the location measure given by the METHOD= option. Specify the MISSING= option when you want to replace missing values with a different value. You can specify any name that is valid in the METHOD= option except the name IN. The corresponding location measure is used to replace missing values.
If a numeric value is given, the value replaces missing values after standardizing the data. However, you can specify the REPONLY option with the MISSING= option to suppress standardization for cases in which you want only to replace missing values.
specifies a constant, c, by which to multiply each value after standardizing. The default value is 1.
specifies the number of markers used when you specify the one-pass algorithm (PCTLMTD=ONEPASS). The value must be greater than or equal to 5. The default value is 105.
omits observations with missing values for any of the analyzed variables from calculation of the location and scale measures. If you omit the NOMISS option, all nonmissing values are used.
normalizes the scale estimator to be consistent for the standard deviation of a normal distribution when you specify the option METHOD=AGK, METHOD=IQR, METHOD=MAD, or METHOD=SPACING.
specifies the name of the SAS data set created by PROC STDIZE. The output data set is a copy of the DATA= data set except that the analyzed variables have been standardized. Note that analyzed variables are those specified in the VAR statement or, if there is no VAR statement, all numeric variables not listed in any other statement. See the section Output Data Sets for more information.
If you want to create a permanent SAS data set, you must specify a two-level name. See SAS Language Reference: Concepts for more information about permanent SAS data sets.
If you omit the OUT= option, PROC STDIZE creates an output data set named according to the DATA convention.
specifies the name of the SAS data set containing the location and scale measures and other computed statistics. See the section Output Data Sets for more information.
specifies which of five definitions is used to calculate percentiles when you specify the option PCTLMTD=ORD_STAT. By default, PCTLDEF=5. Note that the option PCTLMTD=ONEPASS implies PCTLDEF=5. See the section Computational Methods for the PCTLDEF= Option for details about percentile definition.
You cannot use PCTLDEF= when you compute weighted quantiles.
specifies the method used to estimate percentiles. Specify the PCTLMTD=ORD_STAT option to compute the percentiles by the order statistics method.
The PCTLMTD=ONEPASS option modifies an algorithm invented by Jain and Chlamtac (1985). See the section Computing Quantiles for more details about this algorithm.
writes percentiles to the OUTSTAT= data set. Values of n can be any decimal number between 0 and 100, inclusive.
A requested percentile is identified by the _TYPE_ variable in the OUTSTAT= data set with a value of Pn. For example, suppose you specify the option PCTLPTS=10, 30. The corresponding observations in the OUTSTAT= data set that contain the 10th and the 30th percentiles would then have values _TYPE_=P10 and _TYPE_=P30, respectively.
replaces missing data with the value 0 in the standardized data (this value corresponds to the location measure before standardizing). To replace missing data by other values, see the preceding description of the MISSING= option. You cannot specify both the REPLACE and REPONLY options.
replaces missing data only; PROC STDIZE does not standardize the data. Missing values are replaced with the location measure unless you also specify the MISSING=value option, in which case missing values are replaced with value. You cannot specify both the REPLACE and REPONLY options.
normalizes the scale estimator to have an expectation of approximately 1 for a standard normal distribution when you specify the METHOD=SPACING option.
unstandardizes variables when you specify the METHOD=IN(ds) option. The location and scale measures, along with constants for addition and multiplication that the unstandardization is based on, are identified by the _TYPE_ variable in the data set.
The data set must have a _TYPE_ variable and contain the following two observations: a _TYPE_= ‘LOCATION’ observation and a _TYPE_= ‘SCALE’ observation. The variable _TYPE_ can also contain the optional observations, ‘ADD’ and ‘MULT’; if these observations are not found in the data set, the constants specified in the ADD= and MULT= options (or their default values) are used for unstandardization.
See the section OUTSTAT= Data Set for details about the statistics that each value of _TYPE_ represents. The formula used for unstandardization is as follows: If the final output value from the previous standardization is calculated as
The unstandardized variable is computed as
specifies the divisor to be used in the calculation of variances. By default, VARDEF=DF. The values and associated divisors are as follows.
Value |
Divisor |
Formula |
---|---|---|
DF |
degrees of freedom |
|
N |
number of observations |
|
WDF |
sum of weights minus 1 |
( |
WEIGHT | WGT |
sum of weights |
|
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.