Distribution Analysis: Distributional Modeling |
In this example, you fit a normal distribution to the pressure_outer_isobar variable of the Hurricanes data set. The Hurricanes data set contains 6188 observations of tropical cyclones in the Atlantic basin. The pressure_outer_isobar variable gives the sea-level atmospheric pressure for the outermost closed isobar of a cyclone. This is a measure of the atmospheric pressure at the outermost edge of the storm.
The plots and statistics in the Distributional Modeling analysis can help you answer questions such as the following:
Open the Hurricanes data set. |
Create a histogram of the pressure_outer_isobar variable. |
A histogram appears, as shown in Figure 15.1.
Figure 15.1: A Histogram
From the shape of the histogram, you might wonder if the data
distribution can
be modeled by a normal distribution. If not, how do these data
deviate from normality? The following steps add a normal
curve to the histogram, and create other plots and statistics.
Select Analysis Distribution Analysis Distributional Modeling from the main menu, as shown in Figure 15.2. |
Figure 15.2: Selecting the Distributional Modeling Analysis
A dialog box appears as in Figure 15.3.
You can select a variable for the univariate analysis
by using the Variables tab.
Select the variable pressure_outer_isobar, and click Set Y. |
Figure 15.3: Selecting a Variable
Click the Estimators tab. |
The Estimators tab is shown in Figure 15.4.
Figure 15.4: Selecting a Distribution Family
The Estimators tab enables you to select
distributions to fit to the data. For each distribution, you can
enter known parameters, or indicate that the parameters should be
estimated by maximum likelihood.
The section "Specifying Multiple Density Curves" describes how to create a histogram overlaid with more than one density curve. For this example, you select a single distribution to fit to the data.
The normal distribution appears in the Estimators list by default. Also by default, the Automatic radio button is selected. This specifies that the location and scale parameters for the normal distribution be determined by using maximum likelihood estimation.
Accept these defaults and proceed to the next tab.
Click the Plots tab. |
Select all plots, as shown in Figure 15.5. |
Figure 15.5: Selecting Plots
Click OK. |
The analysis calls the UNIVARIATE procedure, which uses the options
specified in the dialog box. The procedure displays tables in the
output document, as shown in Figure 15.6.
Figure 15.6: Output from a Distributional Modeling Analysis
Several plots are created. These plots can help answer the questions
posed earlier.
The goodness-of-fit table in the output document shows that the -values for the goodness-of-fit tests are very small. The null hypothesis for the goodness-of-fit tests is that the data are from a specified theoretical distribution. The smaller the -value, the stronger the evidence against the null hypothesis. The small -values in this example indicate that the normal distribution is not an adequate model to describe these data.
Note: The pressure_outer_isobar variable contains 4669 nonmissing values. For a sample of this size, the goodness-of-fit tests can detect small departures from normality, so it is not surprising that these tests reject the null hypothesis.
For example, Figure 15.7 graphically answers the question,
What observations are contained in the upper quintile (20%) of the data?
The selected observations answer the question:
data values greater than or equal to 1013 hPa. Similarly,
you can ask a converse question: What percentage of the
data has values less than or equal to 1000 hPa? The answer (0.4%)
can also be obtained by interacting with the CDF plot.
Figure 15.7: A CDF Plot
The CDF plot also shows how data are distributed. For example, the long
vertical jumps in the CDF that occur at even values (1008, 1010, and 1012 hPa)
indicate that there are many observations with these values.
In contrast, the short vertical jumps at odd values (for example, 1009,
1011, and 1013 hPa) indicate that there are not many
observations with these values. This fact is not apparent from the
histogram, because the default bin width is 2 hPa.
Copyright © 2008 by SAS Institute Inc., Cary, NC, USA. All rights reserved.