Previous Page | Next Page

Statements with the Same Function in Multiple Procedures

WEIGHT


Specifies weights for analysis variables in the statistical calculations.
Tip: You can use a WEIGHT statement and a FREQ statement in the same step of any procedure that supports both statements.

WEIGHT variable;

Required Arguments

variable

specifies a numeric variable whose values weight the values of the analysis variables. The values of the variable do not have to be integers. The behavior of the procedure when it encounters a nonpositive weight variable value is as follows:

Weight value Procedure
0 counts the observation in the total number of observations
less than 0 converts the weight value to zero and counts the observation in the total number of observations
missing excludes the observation from the analysis

Different behavior for nonpositive values is discussed in the WEIGHT statement syntax under the individual procedure.

Before Version 7 of SAS, no Base SAS procedure excluded the observations with missing weights from the analysis. Most SAS/STAT procedures, such as PROC GLM, have always excluded not only missing weights but also negative and zero weights from the analysis. You can achieve this same behavior in a Base SAS procedure that supports the WEIGHT statement by using the EXCLNPWGT option in the PROC statement.

The procedure substitutes the value of the WEIGHT variable for [equation], which appears in Keywords and Formulas.


Procedures That Support the WEIGHT Statement

Note:   In PROC FREQ, the value of the variable in the WEIGHT statement represents the frequency of occurrence for each observation. See the PROC FREQ documentation for more information.  [cautionend]


Calculating Weighted Statistics

The procedures that support the WEIGHT statement also support the VARDEF= option, which lets you specify a divisor to use in the calculation of the variance and standard deviation.

By using a WEIGHT statement to compute moments, you assume that the ith observation has a variance that is equal to [equation]. When you specify VARDEF=DF (the default), the computed variance is a weighted least squares estimate of [equation]. Similarly, the computed standard deviation is an estimate of [sigma]. Note that the computed variance is not an estimate of the variance of the ith observation, because this variance involves the observation's weight, which varies from observation to observation.

If the values of your variable are counts that represent the number of occurrences of each observation, then use this variable in the FREQ statement rather than in the WEIGHT statement. In this case, because the values are counts, they should be integers. (The FREQ statement truncates any noninteger values.) The variance that is computed with a FREQ variable is an estimate of the common variance [equation] of the observations.

Note:   If your data comes from a stratified sample where the weights [equation] represent the strata weights, then neither the WEIGHT statement nor the FREQ statement provides appropriate stratified estimates of the mean, variance, or variance of the mean. To perform the appropriate analysis, consider using PROC SURVEYMEANS, which is a SAS/STAT procedure that is documented in the SAS/STAT User's Guide.  [cautionend]


Weighted Statistics Example

As an example of the WEIGHT statement, suppose 20 people are asked to estimate the size of an object 30 cm wide. Each person is placed at a different distance from the object. As the distance from the object increases, the estimates should become less precise.

The SAS data set SIZE contains the estimate (ObjectSize) in centimeters at each distance (Distance) in meters and the precision (Precision) for each estimate. Notice that the largest deviation (an overestimate by 20 cm) came at the greatest distance (7.5 meters from the object). As a measure of precision, 1/Distance, gives more weight to estimates that were made closer to the object and less weight to estimates that were made at greater distances.

The following statements create the data set SIZE:

options nodate pageno=1 linesize=64 pagesize=60;

data size;
   input Distance ObjectSize @@;
   Precision=1/distance;
   datalines;
1.5 30 1.5 20 1.5 30 1.5 25
3   43 3   33 3   25 3   30
4.5 25 4.5 36 4.5 48 4.5 33
6   43 6   36 6   23 6   48
7.5 30 7.5 25 7.5 50 7.5 38
;

The following PROC MEANS step computes the average estimate of the object size while ignoring the weights. Without a WEIGHT variable, PROC MEANS uses the default weight of 1 for every observation. Thus, the estimates of object size at all distances are given equal weight. The average estimate of the object size exceeds the actual size by 3.55 cm.

proc means data=size maxdec=3 n mean var stddev;
   var objectsize;
   title1 'Unweighted Analysis of the SIZE Data Set';
run;

            Unweighted Analysis of the SIZE Data Set           1

                      The MEANS Procedure

                 Analysis Variable : ObjectSize 
 
        N            Mean        Variance         Std Dev
       --------------------------------------------------
       20          33.550          80.892           8.994
       --------------------------------------------------

The next two PROC MEANS steps use the precision measure (Precision) in the WEIGHT statement and show the effect of using different values of the VARDEF= option. The first PROC step creates an output data set that contains the variance and standard deviation. If you reduce the weighting of the estimates that are made at greater distances, the weighted average estimate of the object size is closer to the actual size.

proc means data=size maxdec=3 n mean var stddev;
   weight precision;
   var objectsize;
   output out=wtstats var=Est_SigmaSq std=Est_Sigma;
   title1 'Weighted Analysis Using Default VARDEF=DF';
run;

proc means data=size maxdec=3 n mean var std
                     vardef=weight;
   weight precision;
   var objectsize;
   title1 'Weighted Analysis Using VARDEF=WEIGHT';
run;

The variance of the ith observation is assumed to be [equation] and [equation] is the weight for the ith observation. In the first PROC MEANS step, the computed variance is an estimate of [equation]. In the second PROC MEANS step, the computed variance is an estimate of [equation], where [equation] is the average weight. For large n, this value is an approximate estimate of the variance of an observation with average weight.

           Weighted Analysis Using Default VARDEF=DF           1

                      The MEANS Procedure

                 Analysis Variable : ObjectSize 
 
        N            Mean        Variance         Std Dev
       --------------------------------------------------
       20          31.088          20.678           4.547
       --------------------------------------------------
             Weighted Analysis Using VARDEF=WEIGHT             2

                      The MEANS Procedure

                 Analysis Variable : ObjectSize 
 
        N            Mean        Variance         Std Dev
       --------------------------------------------------
       20          31.088          64.525           8.033
       --------------------------------------------------

The following statements create and print a data set with the weighted variance and weighted standard deviation of each observation. The DATA step combines the output data set that contains the variance and the standard deviation from the weighted analysis with the original data set. The variance of each observation is computed by dividing Est_SigmaSq (the estimate of [equation] from the weighted analysis when VARDEF=DF) by each observation's weight (Precision). The standard deviation of each observation is computed by dividing Est_Sigma (the estimate of [equation] from the weighted analysis when VARDEF=DF) by the square root of each observation's weight (Precision).

data wtsize(drop=_freq_ _type_);
   set size;
   if _n_=1 then set wtstats;
   Est_VarObs=est_sigmasq/precision;
   Est_StdObs=est_sigma/sqrt(precision);

proc print data=wtsize noobs;
   title 'Weighted Statistics';
   by distance;
   format est_varobs est_stdobs
          est_sigmasq est_sigma precision 6.3;
run;

                      Weighted Statistics                      4

------------------------- Distance=1.5 -------------------------

  Object                  Est_        Est_     Est_      Est_
   Size     Precision    SigmaSq     Sigma    VarObs    StdObs

    30        0.667      20.678      4.547    31.017     5.569
    20        0.667      20.678      4.547    31.017     5.569
    30        0.667      20.678      4.547    31.017     5.569
    25        0.667      20.678      4.547    31.017     5.569


-------------------------- Distance=3 --------------------------

  Object                  Est_        Est_     Est_      Est_
   Size     Precision    SigmaSq     Sigma    VarObs    StdObs

    43        0.333      20.678      4.547    62.035     7.876
    33        0.333      20.678      4.547    62.035     7.876
    25        0.333      20.678      4.547    62.035     7.876
    30        0.333      20.678      4.547    62.035     7.876


------------------------- Distance=4.5 -------------------------

  Object                  Est_        Est_     Est_      Est_
   Size     Precision    SigmaSq     Sigma    VarObs    StdObs

    25        0.222      20.678      4.547    93.052     9.646
    36        0.222      20.678      4.547    93.052     9.646
    48        0.222      20.678      4.547    93.052     9.646
    33        0.222      20.678      4.547    93.052     9.646


-------------------------- Distance=6 --------------------------

  Object                  Est_        Est_     Est_      Est_
   Size     Precision    SigmaSq     Sigma    VarObs    StdObs

    43        0.167      20.678      4.547    124.07    11.139
    36        0.167      20.678      4.547    124.07    11.139
    23        0.167      20.678      4.547    124.07    11.139
    48        0.167      20.678      4.547    124.07    11.139


------------------------- Distance=7.5 -------------------------

  Object                  Est_        Est_     Est_      Est_
   Size     Precision    SigmaSq     Sigma    VarObs    StdObs

    30        0.133      20.678      4.547    155.09    12.453
    25        0.133      20.678      4.547    155.09    12.453
    50        0.133      20.678      4.547    155.09    12.453
    38        0.133      20.678      4.547    155.09    12.453

Previous Page | Next Page | Top of Page