Statements with the Same Function in Multiple Procedures |
Tip: | You can use a WEIGHT statement and a FREQ statement in the same step of any procedure that supports both statements. |
WEIGHT variable; |
Required Arguments |
specifies a numeric variable whose values weight the values of the analysis variables. The values of the variable do not have to be integers. The behavior of the procedure when it encounters a nonpositive weight variable value is as follows:
Different behavior for nonpositive values is discussed in the WEIGHT statement syntax under the individual procedure.
Before Version 7 of SAS, no Base SAS procedure excluded the observations with missing weights from the analysis. Most SAS/STAT procedures, such as PROC GLM, have always excluded not only missing weights but also negative and zero weights from the analysis. You can achieve this same behavior in a Base SAS procedure that supports the WEIGHT statement by using the EXCLNPWGT option in the PROC statement.
The procedure substitutes the value of the WEIGHT variable for , which appears in Keywords and Formulas.
Procedures That Support the WEIGHT Statement |
CORR
FREQ
MEANS/SUMMARY
REPORT
STANDARD
TABULATE
UNIVARIATE
Note: In PROC FREQ, the value of the variable in the WEIGHT statement represents the frequency of occurrence for each observation. See the PROC FREQ documentation for more information.
Calculating Weighted Statistics |
The procedures that support the WEIGHT statement also support the VARDEF= option, which lets you specify a divisor to use in the calculation of the variance and standard deviation.
By using a WEIGHT statement to compute moments, you assume that the ith observation has a variance that is equal to . When you specify VARDEF=DF (the default), the computed variance is a weighted least squares estimate of . Similarly, the computed standard deviation is an estimate of . Note that the computed variance is not an estimate of the variance of the ith observation, because this variance involves the observation's weight, which varies from observation to observation.
If the values of your variable are counts that represent the number of occurrences of each observation, then use this variable in the FREQ statement rather than in the WEIGHT statement. In this case, because the values are counts, they should be integers. (The FREQ statement truncates any noninteger values.) The variance that is computed with a FREQ variable is an estimate of the common variance of the observations.
Note: If your data comes from a stratified sample where the weights represent the strata weights, then neither the WEIGHT statement nor the FREQ statement provides appropriate stratified estimates of the mean, variance, or variance of the mean. To perform the appropriate analysis, consider using PROC SURVEYMEANS, which is a SAS/STAT procedure that is documented in the SAS/STAT User's Guide.
Weighted Statistics Example |
As an example of the WEIGHT statement, suppose 20 people are asked to estimate the size of an object 30 cm wide. Each person is placed at a different distance from the object. As the distance from the object increases, the estimates should become less precise.
The SAS data set SIZE contains the estimate (ObjectSize) in centimeters at each distance (Distance) in meters and the precision (Precision) for each estimate. Notice that the largest deviation (an overestimate by 20 cm) came at the greatest distance (7.5 meters from the object). As a measure of precision, 1/Distance, gives more weight to estimates that were made closer to the object and less weight to estimates that were made at greater distances.
The following statements create the data set SIZE:
options nodate pageno=1 linesize=64 pagesize=60; data size; input Distance ObjectSize @@; Precision=1/distance; datalines; 1.5 30 1.5 20 1.5 30 1.5 25 3 43 3 33 3 25 3 30 4.5 25 4.5 36 4.5 48 4.5 33 6 43 6 36 6 23 6 48 7.5 30 7.5 25 7.5 50 7.5 38 ;
The following PROC MEANS step computes the average estimate of the object size while ignoring the weights. Without a WEIGHT variable, PROC MEANS uses the default weight of 1 for every observation. Thus, the estimates of object size at all distances are given equal weight. The average estimate of the object size exceeds the actual size by 3.55 cm.
proc means data=size maxdec=3 n mean var stddev; var objectsize; title1 'Unweighted Analysis of the SIZE Data Set'; run;
Unweighted Analysis of the SIZE Data Set 1 The MEANS Procedure Analysis Variable : ObjectSize N Mean Variance Std Dev -------------------------------------------------- 20 33.550 80.892 8.994 --------------------------------------------------
The next two PROC MEANS steps use the precision measure (Precision) in the WEIGHT statement and show the effect of using different values of the VARDEF= option. The first PROC step creates an output data set that contains the variance and standard deviation. If you reduce the weighting of the estimates that are made at greater distances, the weighted average estimate of the object size is closer to the actual size.
proc means data=size maxdec=3 n mean var stddev; weight precision; var objectsize; output out=wtstats var=Est_SigmaSq std=Est_Sigma; title1 'Weighted Analysis Using Default VARDEF=DF'; run; proc means data=size maxdec=3 n mean var std vardef=weight; weight precision; var objectsize; title1 'Weighted Analysis Using VARDEF=WEIGHT'; run;
The variance of the ith observation is assumed to be and is the weight for the ith observation. In the first PROC MEANS step, the computed variance is an estimate of . In the second PROC MEANS step, the computed variance is an estimate of , where is the average weight. For large n, this value is an approximate estimate of the variance of an observation with average weight.
Weighted Analysis Using Default VARDEF=DF 1 The MEANS Procedure Analysis Variable : ObjectSize N Mean Variance Std Dev -------------------------------------------------- 20 31.088 20.678 4.547 --------------------------------------------------
Weighted Analysis Using VARDEF=WEIGHT 2 The MEANS Procedure Analysis Variable : ObjectSize N Mean Variance Std Dev -------------------------------------------------- 20 31.088 64.525 8.033 --------------------------------------------------
The following statements create and print a data set with the weighted variance and weighted standard deviation of each observation. The DATA step combines the output data set that contains the variance and the standard deviation from the weighted analysis with the original data set. The variance of each observation is computed by dividing Est_SigmaSq (the estimate of from the weighted analysis when VARDEF=DF) by each observation's weight (Precision). The standard deviation of each observation is computed by dividing Est_Sigma (the estimate of from the weighted analysis when VARDEF=DF) by the square root of each observation's weight (Precision).
data wtsize(drop=_freq_ _type_); set size; if _n_=1 then set wtstats; Est_VarObs=est_sigmasq/precision; Est_StdObs=est_sigma/sqrt(precision); proc print data=wtsize noobs; title 'Weighted Statistics'; by distance; format est_varobs est_stdobs est_sigmasq est_sigma precision 6.3; run;
Weighted Statistics 4 ------------------------- Distance=1.5 ------------------------- Object Est_ Est_ Est_ Est_ Size Precision SigmaSq Sigma VarObs StdObs 30 0.667 20.678 4.547 31.017 5.569 20 0.667 20.678 4.547 31.017 5.569 30 0.667 20.678 4.547 31.017 5.569 25 0.667 20.678 4.547 31.017 5.569 -------------------------- Distance=3 -------------------------- Object Est_ Est_ Est_ Est_ Size Precision SigmaSq Sigma VarObs StdObs 43 0.333 20.678 4.547 62.035 7.876 33 0.333 20.678 4.547 62.035 7.876 25 0.333 20.678 4.547 62.035 7.876 30 0.333 20.678 4.547 62.035 7.876 ------------------------- Distance=4.5 ------------------------- Object Est_ Est_ Est_ Est_ Size Precision SigmaSq Sigma VarObs StdObs 25 0.222 20.678 4.547 93.052 9.646 36 0.222 20.678 4.547 93.052 9.646 48 0.222 20.678 4.547 93.052 9.646 33 0.222 20.678 4.547 93.052 9.646 -------------------------- Distance=6 -------------------------- Object Est_ Est_ Est_ Est_ Size Precision SigmaSq Sigma VarObs StdObs 43 0.167 20.678 4.547 124.07 11.139 36 0.167 20.678 4.547 124.07 11.139 23 0.167 20.678 4.547 124.07 11.139 48 0.167 20.678 4.547 124.07 11.139 ------------------------- Distance=7.5 ------------------------- Object Est_ Est_ Est_ Est_ Size Precision SigmaSq Sigma VarObs StdObs 30 0.133 20.678 4.547 155.09 12.453 25 0.133 20.678 4.547 155.09 12.453 50 0.133 20.678 4.547 155.09 12.453 38 0.133 20.678 4.547 155.09 12.453
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.