Distribution Analysis Task

About the Distribution Analysis Task

Distribution analysis provides information about the distribution of numeric variables. A variety of plots such as histograms, probability plots, and quantile-quantile plots can be used in this analysis.

Example: Distribution Analysis of Sales for Each Region

In this example, you want to analyze the sales for each region. Because the data contains three regions, you get three sets of results.
To create this example:
  1. In the Tasks section, expand the Statistics folder and double-click Distribution Analysis. The user interface for the Distribution Analysis task opens.
  2. On the Data tab, select the SASHELP.PRICEDATA data set.
  3. Assign columns to these roles:
    Role
    Column Name
    Analysis variables
    sale
    Classification variables
    regionName
  4. Click the Options tab. In the Checking for Normality group, select the Goodness-of-fit tests, Histogram with normal curve, and Normal quantile-quantile plot options. For the quantile-quantile plot, also select the Add a reference line check box.
  5. To run the task, click Submit SAS code.
Here is a subset of the results:
Tests for Normality for Regions 1, 2, and 3
Histograms of the Distributions of Sale in Regions 1 and 2

Assigning Data to Roles

To run the Distribution Analysis task, you must assign a column to the Analysis variables and select a plot or test on the Options tab.
Role
Description
Roles
Analysis variables
specifies the analysis variables and their order in the results.
Classification variables
specifies the variables that are used to group the analysis variables into classification levels. You can assign only two columns to this role.
Additional Roles
Frequency count
specifies a numeric variable whose value represents the frequency of the observation. The Distribution Analysis task assumes that each observation represents n observations, where n is the value of the variable.
Group analysis by
specifies the variables that the Distribution Analysis task uses to form groups.

Setting Options

Option Name
Description
Exploring Data
Select the Histogram check box to create a histogram of the data. You can also specify whether to superimpose a kernel density estimate and the normal density curve on the histogram. Finally, you can specify whether to include an inset box of selected statistics in the graph.
Checking for Normality
Goodness-of-fit tests
requests tests for normality that include a series of goodness-of-fit tests based on the empirical distribution function. The table provides test statistics and p-values for the Shapiro-Wilk test (provided the sample size is less than or equal to 2,000), the Kolmogorov-Smirnov test, the Anderson-Darling test, and the Cramér-von Mises test.
Histogram with normal curve
displays fitted normal density curve on the histogram. The normal distribution has a mean of mu  and a standard deviation of sigma  .
You can also specify whether to include an inset box of selected statistics in the graph.
Normal probability plot
creates a probability plot, which compares ordered variable values with the percentiles of the normal distribution. If the data distribution matches the normal distribution, the points on the plot form a linear pattern. Probability plots are preferable for graphical estimation of percentiles.
The distribution reference line on the plot is created from the maximum likelihood estimate for the parameter.
You can also specify whether to include an inset box of selected statistics in the graph.
Normal quantile-quantile plot
creates quantile-quantile plots (Q-Q plots) and compares ordered variable values with quantiles of the normal distribution. If the data distribution matches the normal distribution, the points on the plot form a linear pattern. Q-Q plots are preferable for graphical estimation of distribution parameters.
The distribution reference line on the plot is created from the maximum likelihood estimate for the parameter.
You can also specify whether to include an inset box of selected statistics in the graph.
Fitting Distributions
Beta
Histogram
fits beta distribution with threshold parameter theta  , scale parameter sigma  , and shape parameters alpha  and beta  .
Probability plot
specifies a beta probability plot for shape parameters alpha  and beta  .
Quantile-quantile plot
specifies a beta Q-Q plot for shape parameters alpha  and beta  .
Exponential
Histogram
fits exponential distribution with threshold parameter theta  and scale parameter sigma  .
Probability plot
specifies an exponential probability plot.
Quantile-quantile plot
specifies an exponential Q-Q plot.
Gamma
Histogram
fits gamma distribution with threshold parameter theta  , scale parameter sigma  , and shape parameter alpha  .
Probability plot
specifies a gamma probability plot for shape parameter alpha  .
Quantile-quantile plot
specifies a gamma Q-Q plot for shape parameter alpha  .
Lognormal
Histogram
fits lognormal distribution with threshold parameter theta  , scale parameter zeta  , and shape parameter sigma  .
Probability plot
specifies a lognormal probability plot for shape parameter sigma  .
Quantile-quantile plot
specifies a lognormal Q-Q plot for shape parameter sigma  .
Weibull
Histogram
fits Weibull distribution with threshold parameter theta  , scale parameter zeta  , and shape parameter c  .
Probability plot
specifies a two-parameter Weibull probability plot.
Quantile-quantile plot
specifies a two-parameter Weibull Q-Q plot.