Contents: | Purpose / History / Requirements / Usage / Details / Limitations / Missing Values / References |
Version | Update Notes |
1.4 | SAS/IML is no longer required if SAS/ETS PROC MODEL is not found. SAS/STAT PROC PRINCOMP is required instead. Univariate plots, if requested, and tests are now presented first. High resolution plotting is done by Base SAS PROC SGPLOT if available, or by SAS/GRAPH PROC GPLOT if not. Checks that the specified data set and variables exist. |
1.3 | Added message showing whether MODEL or IML is selected. Added check for error status after MODEL or IML. Errors will terminate macro. Added automatic check for newer version. Documented difference between tests in PROC MODEL and PROC UNIVARIATE. |
1.2 | Use SAS/ETS PROC MODEL if available to get all tests, then SAS/IML, then univariate only. Use ODS SELECT to obtain only normal table from MODEL (requires SAS 8 or later). Provide univariate histograms with overlaid normal curves and tests controlled by expanded PLOT= parameter. |
1.1 | Use PVALUE format. Prefix notes from macro with MULTNORM: instead of NOTE:. |
1.0 | Initial coding. |
The high resolution multivariate plot requires SAS/STAT in SAS 8 or later. SAS/GRAPH Software in SAS 8 or later is required if Base SAS PROC SGPLOT is not found.
%inc "<location of your file containing the MULTNORM macro>";
Following this statement, you may call the %MULTNORM macro. See the Results tab for an example.
The options and allowable values are:
For multivariate normal data, Mardia (1974) shows that the expected value of the multivariate skewness statistic is
p(p+2)[(n+1)(p+1)-6] / (n+1)(n+3)
and the expected value of the multivariate kurtosis statistic is
p(p+2)(n-1)/(n+1) .
As discussed in Testing for Normality in the PROC MODEL documentation, the Henze-Zirkler test of multivariate normality is based on a nonnegative function that measures the distance between two distribution functions and is used to assess distance between the distribution function of the data and the Multivariate Normal distribution function.
Univariate normality is tested using the Shapiro-Wilk W test or the Kolmogorov-Smirnov test. Additional tests are provided if univariate plots are requested. For details on the univariate tests of normality, see Goodness Of Fit Tests in the PROC UNIVARIATE documentation. There may be differences in the Shapiro-Wilk test statistics and p-values produced by PROC MODEL and PROC UNIVARIATE as documented in this SAS Note. The algorithm in PROC UNIVARIATE uses the updated method of Royston (1992). PROC MODEL uses the older method of Royston (1982). While the PROC MODEL results are correct, the Shapiro-Wilk tests provided by PROC UNIVARIATE are preferred.
If the p-value of any of the tests is small, then multivariate normality can be rejected. However, it is important to note that the univariate Shapiro-Wilk W test is a very powerful test and is capable of detecting small departures from univariate normality with relatively small sample sizes. This may cause you to reject univariate, and therefore multivariate, normality unnecessarily if the tests are being done to validate the use of methods that are robust to small departures from normality. For such situations, the plots provide a visual assessment of approximate normality.
When the covariance matrix for the data is singular, the macro quits and issues the following message:
ERROR: Covariance matrix is singular.
The PRINCOMP procedure in SAS/STAT Software is required for this check. If it is not found, a message is printed, nonsingularity is assumed, and the macro attempts to perform the multivariate test and plot.
For p variables and a large sample size, the squared Mahalanobis distances of the observations to the mean vector are distributed as chi-square with p degrees of freedom. However, the sample size must be quite large for the chi-square distribution to obtain unless p is very small. Also, this plot is sensitive to the presence of outliers. So, this plot should be cautiously used as a rough indicator of multivariate normality.
Mardia (1974), "Applications of some measures of multivariate skewness and kurtosis in testing normality and robustness studies," Sankhya B, 36, 115-128.
Mardia (1975), "Assessment of Multinormality and the Robustness of Hotelling's T-squared Test," Applied Statistics, 1975, 24(2).
Mardia, K.V., Kent, J.T., and Bibby, J.M. (1979), "Multivariate Analysis," New York: Academic Press.
Mardia (1980), "Measures of Multivariate Skewness and Kurtosis with Applications," Biometrika, 57(3).
Royston, J.P. (1982), "An Extension of Shapiro and Wilk's W Test for Normality to Large Samples," Applied Statistics, 31.
Royston, J.P. (1992), "Approximating the Shapiro-Wilk W-Test for Non-normality," Statistics and Computing, 2, 117 - 119.
Shapiro, S.S. and Wilk, M.B. (1965), "An Analysis of Variance Test for Normality (complete samples)," Biometrika, 52.
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
data cork; input n e s w @@; datalines; 72 66 76 77 91 79 100 75 60 53 66 63 56 68 47 50 56 57 64 58 79 65 70 61 41 29 36 38 81 80 68 58 32 32 35 36 78 55 67 60 30 35 34 26 46 38 37 38 39 39 31 27 39 35 34 37 42 43 31 25 32 30 30 32 37 40 31 25 60 50 67 54 33 29 27 36 35 37 48 39 32 30 34 28 39 36 39 31 63 45 74 63 50 34 37 40 54 46 60 52 43 37 39 50 47 51 52 43 48 54 57 43 ; %inc "<location of your file containing the MULTNORM macro>"; %multnorm(data=cork, var=n e s w, plot=mult)
For p variables listed in the VAR= option, the first p rows of the Normality Test table contain univariate tests of normality, including the name of the test (Shapiro-Wilk or Kolmogorov-Smirnov), the values of the test statistic, and the corresponding p-values. The next two lines of the table contain Mardia's tests of multivariate normality, including the name of the test (skewness or kurtosis), the values of the multivariate skewness and kurtosis statistics (when the PRINCOMP procedure is used), the values of the test statistics, and their p-values. When the MODEL procedure is used, the Henze-Zirkler test of multivariate normality is also provided.
If PLOT=MULT or BOTH is specified, a chi-square quantile-quantile plot is produced which plots the squared Mahalanobis distances against corresponding quantiles of the limiting chi-square distribution. If the data are distributed as multivariate normal, then the points should fall on a straight line with slope one and intercept zero. See DETAILS for more information.
Following are the results from the above example. The Mardia tests do not reject multivariate normality, but the Henze-Zirkler multivariate test does. Also, the univariate tests reject univariate normality, and therefore multivariate normality. The multivariate plot seems to indicate approximate normality, but the sample is quite small.
Right-click on the link below and select Save to save
the %MULTNORM macro definition
to a file. It is recommended that you name the file
multnorm.sas
.
Download and save multnorm.sas
Type: | Sample |
Topic: | Analytics ==> Regression SAS Reference ==> Procedures ==> PRINCOMP Analytics ==> Exploratory Data Analysis Analytics ==> Multivariate Analysis Analytics ==> Econometrics Analytics ==> Discriminant Analysis SAS Reference ==> Procedures ==> MODEL Analytics ==> Descriptive Statistics SAS Reference ==> Procedures ==> UNIVARIATE Analytics ==> Analysis of Variance |
Date Modified: | 2007-08-11 03:03:06 |
Date Created: | 2005-01-13 15:02:42 |
Product Family | Product | Host | SAS Release | |
Starting | Ending | |||
SAS System | SAS/ETS | All | n/a | n/a |
SAS System | SAS/STAT | All | n/a | n/a |
SAS System | SAS/IML | All | n/a | n/a |