Previous Page  Next Page 
Distribution Analyses


Observations with missing values for a Y variable are not used in the analysis for that variable. Observations with Weight or Freq values that are missing or that are less than or equal to zero are not used. Only the integer part of Freq values is used.

The following notation is used in the rest of this chapter:

The summation {\sum_{i}^{}} represents a summation of {\sum_{i=1}^n}.

Based on the variance definition, vardef, the variance divisor d is computed as

 \bullet {d = n-1}   for vardef=DF, degrees of freedom
 \bullet {d = n}   for vardef=N, number of observations

The skewness is a measure of the tendency of the deviations from the mean to be larger in one direction than in the other. The sample skewness is calculated as

 \bullet { g_{1} = c_{3n} \sum_{i}^{}{z^3_{i}} }   for vardef=DF
 \bullet { g_{1} = \frac{1}n \sum_{i}^{}{z^3_{i}} }   for vardef=N

where c3n = [n/((n-2))][1/((n-1))].

The kurtosis is primarily a measure of the heaviness of the tails of a distribution. The sample kurtosis is calculated as

 \bullet { g_{2} = c_{4n} \sum_{i}^{}{z^4_{i}} - 3 c_{n} }   for vardef=DF
 \bullet { g_{2} = \frac{1}n \sum_{i}^{}{z^4_{i}} - 3}   for vardef=N

where c4n = [(n(n+1))/((n-2)(n-3))][1/((n-1))] and cn = [((n-1)2)/((n-2)(n-3))].

When the observations are independently distributed with a common mean and unequal variances, { \sigma_{i}^2 = \sigma^2/w_{i}},where wi are individual weights, weighted analyses may be appropriate. You select a Weight variable to specify relative weights for each observation in the analysis.

The following notation is used in weighted analyses:

In addition to vardef=DF and vardef=N, the variance divisor is also computed as

 \bullet {d = \sum_{i}^{}{w_{i}}-1}   for vardef=WDF, sum of weights minus 1
 \bullet {d = \sum_{i}^{}{w_{i}}}   for vardef=WGT, sum of weights

With {Var(y_{i}) = \sigma^2_{i}= \sigma^2/w_{i}}, Var({\overline y_{w}})= \sigma^2 / \sum_{i}{w_{i}} and the expected value

E( \sum_{i}^{}{ w_{i} ( y_{i} - {\overline y_{w}})^2} ) = E( \sum_{i}^{}{w_{i}... ... - \mu)^2} - \sum_{i}^{}{w_{i} ({\overline y_{w}} - \mu)^2} ) = (n-1) \sigma^2

The use of vardef=WDF/WGT may not be appropriate since it is the weighted average of individual variances, \sigma^2_{i}, which have unequal expected values.

For vardef=DF/N, s2w is the variance of observations with unit weight and may not be informative in the weighted plots of parametric normal distributions. SAS/INSIGHT software uses the weighted sample variance for an observation with average weight, { s^2_{a} = s^2_{w} / {\overline w}}, to replace s2w in the plots.

The weighted skewness is computed as

 \bullet { g_{w1} = c_{3n} \sum_{i}^{}{z^{wi}_{3}} = c_{3n} \sum_{i}^{}{w^{\frac{3}2}_{i} ( \frac{y_{i}-{\overline y}}{s_{w}} )^3 } }   for DF
 \bullet { g_{w1} = \frac{1}n \sum_{i}^{}{z^{wi}_{3}} = \frac{1}n \sum_{i}^{}{w^{\frac{3}2}_{i} ( \frac{y_{i}-{\overline y}}{s_{w}} )^3 } }   for N

The weighted kurtosis is computed as

 \bullet { g_{w2} = c_{4n} \sum_{i}^{}{z^{wi}_{4}} - 3 c_{n} = c_{4n} \sum_{i}^{}{w^2_{i} ( \frac{y_{i}-{\overline y}}{s_{w}} )^4 } - 3 c_{n} }   for DF
 \bullet { g_{w2} = \frac{1}n \sum_{i}^{}{z^{wi}_{4}} -3 = \frac{1}n \sum_{i}^{}{w^2_{i} ( \frac{y_{i}-{\overline y}}{s_{w}} )^4 } -3 }   for N

The formulations are invariant under the transformation w*i = c wi, c > 0. The sample skewness and kurtosis are set to missing if vardef=WDF or vardef=WGT.

To view or change the divisor d used in the calculation of variances, or to view or change the use of observations with missing values, click on the Method button from the variables dialog to display the method options dialog.

dist03.gif (3247 bytes)

Figure 38.3: Distribution Method Options Dialog

By default, SAS/INSIGHT software uses vardef=DF, degrees of freedom to compute the variance divisor.

When multiple Y variables are analyzed, and some Y variables have missing values, the Use Obs with Missing Values option uses all observations with nonmissing values for the Y variable being analyzed. If the option is turned off, observations with missing values for any Y variable are not used for any analysis.

Previous Page  Next Page  Top of Page

Copyright © 2007 by SAS Institute Inc., Cary, NC, USA. All rights reserved.