Distribution Analyses

Method

Observations with missing values for a Y variable are not used in the analysis for that variable. Observations with Weight or Freq values that are missing or that are less than or equal to zero are not used. Only the integer part of Freq values is used.

The following notation is used in the rest of this chapter:

n is the number of nonmissing values.
y_i is the ith observed nonmissing value.
y_(i)is the ith ordered nonmissing value, ${ y_{(1)}{\le} y_{(2)}{\le} ... {\le} y_{(n)}}$ .
${\overline y}$ is the sample mean, ${\sum_{i}^{}{y_{i}/n}}$ .
d is the variance divisor.
s² is the sample variance, ${\sum_{i}^{}{( y_{i}- {\overline y})^2/d}}$ .
z_i is the standardized value, ${( y_{i}-{\overline y})/s}$ .

The summation ${\sum_{i}^{}}$ represents a summation of ${\sum_{i=1}^n}$ .

Based on the variance definition, vardef, the variance divisor d is computed as

$\bullet {d = n-1}$		for vardef=DF, degrees of freedom
$\bullet {d = n}$		for vardef=N, number of observations

The skewness is a measure of the tendency of the deviations from the mean to be larger in one direction than in the other. The sample skewness is calculated as

$\bullet { g_{1} = c_{3n} \sum_{i}^{}{z^3_{i}} }$		for vardef=DF
$\bullet { g_{1} = \frac{1}n \sum_{i}^{}{z^3_{i}} }$		for vardef=N

where c_3n = [n/((n-2))][1/((n-1))].

The kurtosis is primarily a measure of the heaviness of the tails of a distribution. The sample kurtosis is calculated as

$\bullet { g_{2} = c_{4n} \sum_{i}^{}{z^4_{i}} - 3 c_{n} }$		for vardef=DF
$\bullet { g_{2} = \frac{1}n \sum_{i}^{}{z^4_{i}} - 3}$		for vardef=N

where c_4n = [(n(n+1))/((n-2)(n-3))][1/((n-1))] and c_n = [((n-1)²)/((n-2)(n-3))].

When the observations are independently distributed with a common mean and unequal variances, ${ \sigma_{i}^2 = \sigma^2/w_{i}}$ ,where w_i are individual weights, weighted analyses may be appropriate. You select a Weight variable to specify relative weights for each observation in the analysis.

The following notation is used in weighted analyses:

w_i is the weight associated with y_i.
w_(i) is the weight associated with y_(i).
${{\overline w}}$ is the average observation weight, ${\sum_{i}^{}{w_{i}}/n}$ .
${{\overline y_{w}}}$ is the weighted sample mean, ${\sum_{i}^{}{w_{i} y_{i}} / \sum_{i}^{}{w_{i}} }$ .
s²_w is the weighted sample variance, $\sum_{i}^{} w_{i} ( y_{i} - {\overline y_{w}})^2/ d$ .
z_wi is the standardized value, ${( y_{i} - {\overline y_{w}})/ ( s_{w}/\sqrt{ w_{i}}) }$ .

In addition to vardef=DF and vardef=N, the variance divisor is also computed as

$\bullet {d = \sum_{i}^{}{w_{i}}-1}$		for vardef=WDF, sum of weights minus 1
$\bullet {d = \sum_{i}^{}{w_{i}}}$		for vardef=WGT, sum of weights

With ${Var(y_{i}) = \sigma^2_{i}= \sigma^2/w_{i}}$ , $Var({\overline y_{w}})= \sigma^2 / \sum_{i}{w_{i}}$ and the expected value

$E( \sum_{i}^{}{ w_{i} ( y_{i} - {\overline y_{w}})^2} ) = E( \sum_{i}^{}{w_{i}... ... - \mu)^2} - \sum_{i}^{}{w_{i} ({\overline y_{w}} - \mu)^2} ) = (n-1) \sigma^2$

Note	The use of vardef=WDF/WGT may not be appropriate since it is the weighted average of individual variances, $\sigma^2_{i}$ , which have unequal expected values.

For vardef=DF/N, s²_w is the variance of observations with unit weight and may not be informative in the weighted plots of parametric normal distributions. SAS/INSIGHT software uses the weighted sample variance for an observation with average weight, ${ s^2_{a} = s^2_{w} / {\overline w}}$ , to replace s²_w in the plots.

The weighted skewness is computed as

$\bullet { g_{w1} = c_{3n} \sum_{i}^{}{z^{wi}_{3}} = c_{3n} \sum_{i}^{}{w^{\frac{3}2}_{i} ( \frac{y_{i}-{\overline y}}{s_{w}} )^3 } }$		for DF
$\bullet { g_{w1} = \frac{1}n \sum_{i}^{}{z^{wi}_{3}} = \frac{1}n \sum_{i}^{}{w^{\frac{3}2}_{i} ( \frac{y_{i}-{\overline y}}{s_{w}} )^3 } }$		for N

The weighted kurtosis is computed as

$\bullet { g_{w2} = c_{4n} \sum_{i}^{}{z^{wi}_{4}} - 3 c_{n} = c_{4n} \sum_{i}^{}{w^2_{i} ( \frac{y_{i}-{\overline y}}{s_{w}} )^4 } - 3 c_{n} }$		for DF
$\bullet { g_{w2} = \frac{1}n \sum_{i}^{}{z^{wi}_{4}} -3 = \frac{1}n \sum_{i}^{}{w^2_{i} ( \frac{y_{i}-{\overline y}}{s_{w}} )^4 } -3 }$		for N

The formulations are invariant under the transformation w^*_i = c w_i, c > 0. The sample skewness and kurtosis are set to missing if vardef=WDF or vardef=WGT.

To view or change the divisor d used in the calculation of variances, or to view or change the use of observations with missing values, click on the Method button from the variables dialog to display the method options dialog.

Figure 38.3: Distribution Method Options Dialog

By default, SAS/INSIGHT software uses vardef=DF, degrees of freedom to compute the variance divisor.

When multiple Y variables are analyzed, and some Y variables have missing values, the Use Obs with Missing Values option uses all observations with nonmissing values for the Y variable being analyzed. If the option is turned off, observations with missing values for any Y variable are not used for any analysis.

Top of Page