In addition to summarizing a data distribution as in the preceding example, you can use PROC UNIVARIATE to statistically model
a distribution based on a random sample of data. The following statements create a data set named Aircraft
that contains the measurements of a position deviation for a sample of 30 aircraft components.
data Aircraft; input Deviation @@; label Deviation = 'Position Deviation'; datalines; -.00653 0.00141 -.00702 -.00734 -.00649 -.00601 -.00631 -.00148 -.00731 -.00764 -.00275 -.00497 -.00741 -.00673 -.00573 -.00629 -.00671 -.00246 -.00222 -.00807 -.00621 -.00785 -.00544 -.00511 -.00138 -.00609 0.00038 -.00758 -.00731 -.00455 ;
An initial question in the analysis is whether the measurement distribution is normal. The following statements request a table of moments, the tests for normality, and a normal probability plot, which are shown in Figure 4.5 and Figure 4.6:
title 'Position Deviation Analysis'; ods graphics on; ods select Moments TestsForNormality ProbPlot; proc univariate data=Aircraft normaltest; var Deviation; probplot Deviation / normal(mu=est sigma=est) square odstitle = title; label Deviation = 'Position Deviation'; inset mean std / format=6.4; run;
PROC UNIVARIATE uses the label associated with the variable Deviation
as the vertical axis label in the probability plot. The INSET
statement displays the sample mean and standard deviation on the probability plot.
Figure 4.5: Moments and Tests for Normality
Position Deviation Analysis |
Moments | |||
---|---|---|---|
N | 30 | Sum Weights | 30 |
Mean | -0.0053067 | Sum Observations | -0.1592 |
Std Deviation | 0.00254362 | Variance | 6.47002E-6 |
Skewness | 1.2562507 | Kurtosis | 0.69790426 |
Uncorrected SS | 0.00103245 | Corrected SS | 0.00018763 |
Coeff Variation | -47.932613 | Std Error Mean | 0.0004644 |
All four goodness-of-fit tests in Figure 4.5 reject the hypothesis that the measurements are normally distributed.
Figure 4.6 shows a normal probability plot for the measurements. A linear pattern of points following the diagonal reference line would indicate that the measurements are normally distributed. Instead, the curved point pattern suggests that a skewed distribution, such as the lognormal, is more appropriate than the normal distribution.
A lognormal distribution for Deviation
is fitted in Example 4.26.
A sample program for this example, univar2.sas, is available in the SAS Sample Library for Base SAS software.