# The UNIVARIATE Procedure

### Modeling a Data Distribution

In addition to summarizing a data distribution as in the preceding example, you can use PROC UNIVARIATE to statistically model a distribution based on a random sample of data. The following statements create a data set named `Aircraft` that contains the measurements of a position deviation for a sample of 30 aircraft components.

```data Aircraft;
input Deviation @@;
label Deviation = 'Position Deviation';
datalines;
-.00653 0.00141 -.00702 -.00734 -.00649 -.00601
-.00631 -.00148 -.00731 -.00764 -.00275 -.00497
-.00741 -.00673 -.00573 -.00629 -.00671 -.00246
-.00222 -.00807 -.00621 -.00785 -.00544 -.00511
-.00138 -.00609 0.00038 -.00758 -.00731 -.00455
;

```

An initial question in the analysis is whether the measurement distribution is normal. The following statements request a table of moments, the tests for normality, and a normal probability plot, which are shown in Figure 4.5 and Figure 4.6:

```title 'Position Deviation Analysis';
ods graphics on;
ods select Moments TestsForNormality ProbPlot;
proc univariate data=Aircraft normaltest;
var Deviation;
probplot Deviation / normal(mu=est sigma=est)
square
odstitle = title;
label Deviation = 'Position Deviation';
inset  mean std / format=6.4;
run;
```

PROC UNIVARIATE uses the label associated with the variable `Deviation` as the vertical axis label in the probability plot. The INSET statement displays the sample mean and standard deviation on the probability plot.

Figure 4.5: Moments and Tests for Normality

 Position Deviation Analysis

The UNIVARIATE Procedure
Variable: Deviation (Position Deviation)

Moments
N 30 Sum Weights 30
Mean -0.0053067 Sum Observations -0.1592
Std Deviation 0.00254362 Variance 6.47002E-6
Skewness 1.2562507 Kurtosis 0.69790426
Uncorrected SS 0.00103245 Corrected SS 0.00018763
Coeff Variation -47.932613 Std Error Mean 0.0004644

Tests for Normality
Test Statistic p Value
Shapiro-Wilk W 0.845364 Pr < W 0.0005
Kolmogorov-Smirnov D 0.208921 Pr > D <0.0100
Cramer-von Mises W-Sq 0.329274 Pr > W-Sq <0.0050
Anderson-Darling A-Sq 1.784881 Pr > A-Sq <0.0050

All four goodness-of-fit tests in Figure 4.5 reject the hypothesis that the measurements are normally distributed.

Figure 4.6 shows a normal probability plot for the measurements. A linear pattern of points following the diagonal reference line would indicate that the measurements are normally distributed. Instead, the curved point pattern suggests that a skewed distribution, such as the lognormal, is more appropriate than the normal distribution.

A lognormal distribution for `Deviation` is fitted in Example 4.26.

A sample program for this example, univar2.sas, is available in the SAS Sample Library for Base SAS software.

Figure 4.6: Normal Probability Plot 