The UNIVARIATE Procedure

WEIGHT Statement

WEIGHT variable ;

The WEIGHT statement specifies numeric weights for analysis variables in the statistical calculations. The UNIVARIATE procedure uses the values $w_{i}$ of the WEIGHT variable to modify the computation of a number of summary statistics by assuming that the variance of the $i$th value $x_{i}$ of the analysis variable is equal to $\sigma ^{2}/w_{i}$, where $\sigma $ is an unknown parameter. The values of the WEIGHT variable do not have to be integers and are typically positive. By default, observations with nonpositive or missing values of the WEIGHT variable are handled as follows:[1]

  • If the value is zero, the observation is counted in the total number of observations.

  • If the value is negative, it is converted to zero, and the observation is counted in the total number of observations.

  • If the value is missing, the observation is excluded from the analysis.

To exclude observations that contain negative and zero weights from the analysis, use EXCLNPWGT. Note that most SAS/STAT procedures, such as PROC GLM, exclude negative and zero weights by default. The weight variable does not change how the procedure determines the range, mode, extreme values, extreme observations, or number of missing values. When you specify a WEIGHT statement, the procedure also computes a weighted standard error and a weighted version of Student’s $t$ test. The Student’s $t$ test is the only test of location that PROC UNIVARIATE computes when you weight the analysis variables.

When you specify a WEIGHT variable, the procedure uses its values, $w_{i}$, to compute weighted versions of the statistics[2] provided in the Moments table. For example, the procedure computes a weighted mean $\overline{x}_{w}$ and a weighted variance $s_{w}^{2}$ as

\[  \overline{x}_{w} = \frac{ \sum _{i} w_{i}x_{i} }{ \sum _{i} w_{i} }  \]

and

\[  s_{w}^{2} = \frac{ 1 }{ d } \sum _{i} w_{i} ( x_{i} - \overline{x}_{w} )^{2}  \]

where $x_{i}$ is the $i$th variable value. The divisor $d$ is controlled by the VARDEF= option in the PROC UNIVARIATE statement.

The WEIGHT statement does not affect the determination of the mode, extreme values, extreme observations, or the number of missing values of the analysis variables. However, the weights $w_{i}$ are used to compute weighted percentiles.[3] The WEIGHT variable has no effect on graphical displays produced with the plot statements.

The CIPCTLDF, CIPCTLNORMAL, LOCCOUNT, NORMAL, ROBUSTSCALE, TRIMMED=, and WINSORIZED= options are not available with the WEIGHT statement.

To compute weighted skewness or kurtosis, use VARDEF=DF or VARDEF=N in the PROC statement.

You cannot specify the HISTOGRAM, PROBPLOT, or QQPLOT statements with the WEIGHT statement.

When you use the WEIGHT statement, consider which value of the VARDEF= option is appropriate. See VARDEF= and the calculation of weighted statistics for more information.



[1] In SAS 6.12 and earlier releases, observations were used in the analysis if and only if the WEIGHT variable value was greater than zero.

[2] In SAS 6.12 and earlier releases, weighted skewness and kurtosis were not computed.

[3] In SAS 6.12 and earlier releases, the weights did not affect the computation of percentiles and the procedure did not exclude the observations with missing weights from the count of observations.