Contents: | Purpose / History / Requirements / Usage / Details / Limitations / Missing Values / References |
The MultNorm macro always attempts to check for a later version of itself. If it is unable to do this (such as if there is no active internet connection available), the macro issues the following message:
The computations performed by the macro are not affected by the appearance of this message. However, you can avoid this check by specifying nochk as the first macro argument. This action can be useful if your machine has no connection to the internet.
Version
|
Update Notes
|
2.0 | Added the Royston and Doornik-Hansen tests. SAS/ETS® is no longer required or used to compute any test. var= is now optional. plot=mult is now the default. nochk is added. All tests are saved in data set _STATS. |
1.4 | SAS/IML® is no longer required if SAS/ETS PROC MODEL is not found. SAS/STAT® PROC PRINCOMP is required instead. Univariate plots, if requested, and tests are now presented first. High-resolution plotting is done by Base SAS® PROC SGPLOT if available, or by SAS/GRAPH® PROC GPLOT if not. It checks that the specified data set and variables exist. |
1.3 | Added a message showing whether MODEL or IML is selected. Added a check for error status after MODEL or IML. Errors terminate the macro. Added an automatic check for a newer version. Documented the difference between tests in PROC MODEL and PROC UNIVARIATE. |
1.2 | Use SAS/ETS PROC MODEL if available to get all tests, then SAS/IML, and then univariate only. Use ODS SELECT to obtain only the normal table from MODEL (requires SAS® 8 or later). Provide univariate histograms with overlaid normal curves and tests controlled by the expanded PLOT= parameter. |
1.1 | Use the PVALUE format. Prefix notes from macro with MULTNORM: instead of NOTE:. |
1.0 | Initial coding. |
Following this statement, you can call the MultNorm macro. See the Results tab for an example.
The options and allowable values are as follows:
Univariate tests and plots
Univariate normality for each of the analysis variables specified in var= is assessed using the Shapiro-Wilk W test (for sample size 2000 or less) or the Kolmogorov-Smirnov test, depending on the sample size, as done in the UNIVARIATE procedure. Additional tests are provided if univariate plots are requested with plot=uni or plot=both. For details about the univariate tests of normality, see Goodness Of Fit Tests in the Details section of the PROC UNIVARIATE documentation.
If the p-value of any of the tests is small, then multivariate, as well as univariate, normality can be rejected. However, it is important to note that the univariate Shapiro-Wilk W test is very powerful and is capable of detecting trivially small departures from univariate normality as the sample size becomes large. This might cause you to reject univariate, and therefore multivariate, normality unnecessarily if the tests are being done to validate the use of methods that are robust to small departures from normality. For such situations, the plots are useful by providing a visual assessment of approximate normality.
Multivariate tests and plot
Four tests of multivariate normality are available in the MultNorm macro. Mardia (1974) proposed tests of multivariate normality based on sample measures of multivariate skewness and kurtosis. The Henze-Zirkler test of multivariate normality is based on a nonnegative function that measures the distance between two distribution functions and is used to assess the distance between the distribution function of the data and the multivariate normal distribution function. Royston (1983, 1992, 1993) introduced a multivariate test based on and extending the Shapiro-Wilk W test for univariate normality. The Doornik-Hansen test uses skewness and kurtosis to create an omnibus test of multivariate normality.
For all of the tests provided, a small p-value rejects the null hypothesis of multivariate normality.
All tests, except the Royston test, and the chi-square plot require the PRINCOMP procedure in SAS/STAT to compute principal component scores or the eigenvalues and eigenvectors of the correlation matrix of the original variables. When the correlation matrix for the data is singular or if PROC PRINCOMP is not available, then a message is printed in the log and only the Royston multivariate normality test is done.
Under the normal distribution, the expected multivariate skewness is p(p+2)[(n+1)(p+1)-6]/(n+1)(n+3) and the expected multivariate kurtosis is p(p+2)(n-1)/(n+1). MultNorm displays centered values (observed minus expected) of these statistics and a small p-value indicates significant deviation of the observed measure from expected under normality. Mardia's multivariate skewness statistic and p-value is computed using a small sample correction multiplier. Since the value of this correction diminishes to very near 1 beyond a sample size of about 100, it is always included. The uncentered skewness and kurtosis measures, their expected values under normality, and the uncorrected multivariate skewness statistic and its p-value are not included in the table displayed by the macro but are included in the output data set, _STATS.
Many tests for multivariate normality have been proposed, and while no single test has been found to be uniformly best, the three offered by the MultNorm macro are among the ones most often used. Farrell et al. (2007) found that the Type I error rate was well-preserved by both the Royston test and the Doornik-Hansen test over a wide range of sample sizes and number of variables. The Henze-Zirkler test also performs well in this regard for sample sizes above 75 and has good statistical power against alternative distributions, but is slightly conservative for smaller sample sizes. Royston's test exhibits good power for smaller sample sizes.
Chi-square Q-Q plot
For p variables and a large sample size, the squared Mahalanobis distances of the observations to the mean vector are distributed as chi-square with p degrees of freedom. However, the sample size must be quite large for the chi-square distribution to obtain unless p is very small. Also, this plot is sensitive to the presence of outliers. So, this plot should be cautiously used as a rough indicator of multivariate normality.
While the MultNorm macro does not directly support BY group processing, this capability can be provided by the RunBY macro that can run the MultNorm macro repeatedly for each of the BY groups in your data. See the RunBY macro documentation for details about its use. Also see the example titled "BY group processing" on the Results tab.
Output data set
The results of the univariate Shapiro-Wilk or Kolmogorov-Smirnov tests as well as any multivariate tests are saved in data set _STATS. If many variables are tested for univariate normality, the p-values in this data set could be adjusted for multiple testing be using it as input to the MULTTEST procedure.
Memory and time requirements increase with both the sample size and number of variables. Data sets having thousands of observations and/or hundreds of variables might require excessive running time or memory.
Farrell, P.J., Salibian-Barrera, M. and Naczk, K. (2007), "On tests for multivariate normality and associated simulation studies," Journal of Statistical Computation & Simulation, Vol. 77(12), 1065-1080.
Henze, N. and Zirkler, B. (1990), "A Class of Invariant Consistent tests for Multivariate Normality," Communications in Statistics, Part A - Theory and Methods., 19(10), 3595-3617.
Mardia, K.V. (1974), "Applications of some measures of multivariate skewness and kurtosis in testing normality and robustness studies," Sankhya B, 36, 115-128.
Mardia, K.V. (1975), "Assessment of Multinormality and the Robustness of Hotelling's T-squared Test," Applied Statistics, 1975, 24(2), 163-171.
Mardia, K.V., Kent, J.T., and Bibby, J.M. (1979), Multivariate Analysis, New York: Academic Press.
Mardia, K.V. (1980), "Measures of Multivariate Skewness and Kurtosis with Applications," Biometrika, 57(3), 519-530.
Royston, J.P. (1982), "An Extension of Shapiro and Wilk's W Test for Normality to Large Samples," Applied Statistics, 31, 115-124.
Royston, J.P. (1983), "Techniques for Assessing Multivarate Normality Based on the Shapiro-Wilk W," Journal of the Royal Statistical Society, Series C (Applied Statistics), Vol. 32(2), 121-133.
Royston, J.P. (1992), "Approximating the Shapiro-Wilk W-Test for Non-normality," Statistics and Computing, 2, 117-119.
Royston, J.P. (1993), "A Toolkit for Testing for Non-Normality in Complete and Censored Samples," Journal of the Royal Statistical Society, Series D (The Statistician, Vol. 42(1), 37-43.
Shapiro, S.S. and Wilk, M.B. (1965), "An Analysis of Variance Test for Normality (complete samples)," Biometrika, 52, 591-611.
Svantesson, T. and Wallace, J.W. (2003), "Tests for assessing multivariate normality and the covariance structure of MIMO data," Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing.
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
data cork; input n e s w @@; datalines; 72 66 76 77 91 79 100 75 60 53 66 63 56 68 47 50 56 57 64 58 79 65 70 61 41 29 36 38 81 80 68 58 32 32 35 36 78 55 67 60 30 35 34 26 46 38 37 38 39 39 31 27 39 35 34 37 42 43 31 25 32 30 30 32 37 40 31 25 60 50 67 54 33 29 27 36 35 37 48 39 32 30 34 28 39 36 39 31 63 45 74 63 50 34 37 40 54 46 60 52 43 37 39 50 47 51 52 43 48 54 57 43 ; %inc "<location of your file containing the MultNorm macro>"; %multnorm(data=cork)
Below are the results from the above example. The first rows in the results display the univariate tests of normality, including the name of the test (Shapiro-Wilk or Kolmogorov-Smirnov) that was used, the variable name, the values of the test statistic, and the corresponding p-values. The next two lines display Mardia's tests of multivariate normality based on skewness or kurtosis. The "Centered Skewness & Kurtosis" column shows the deviations of the multivariate skewness and kurtosis from the values expected under the multivariate normal distribution. Larger deviations suggest nonnormality. The tests of multivariate normality based on these statistics as well as the Royston, Henze-Zirkler, and Doornik-Hansen tests are displayed including their test statistics and p-values. For all tests, univariate and multivariate, small p-values indicate nonnormality. Tests, as well as plots, of univariate normality for each variable can also be provided by specifying plot=uni or plot=both. See the Usage section on the Details tab for more information about these tests.
The chi-square quantile-quantile plot plots the squared Mahalanobis distances against the corresponding quantiles of the limiting chi-square distribution. If the data are distributed as multivariate normal, then the points should fall close to the diagonal line with slope one and intercept zero. Plots of the individual variables can also be displayed if plot=uni or plot=both is specified. See the Usage section in the Details tab for more information.
In the results displayed below, the Mardia tests do not reject multivariate normality. Note the deviations of skewness and kurtosis from normal are small. However, the Royston and Henze-Zirkler multivariate tests suggest nonnormality. The Doornik-Hansen test is marginal. Also, at least one of the univariate tests rejects univariate normality and therefore multivariate normality. The multivariate plot seems to indicate approximate normality, but the sample is quite small.
%macro code; data subset; sashelp.iris; where &_BY1=&_LVL1; run; footnote "Above for &BYlabel"; %multnorm(data=subset, var=SepalLength SepalWidth PetalLength PetalWidth, plot=none) footnote; %mend; %RunBY(data=sashelp.iris, by=species)
Right-click the link below and select Save to save the MultNorm macro definition to a file. It is recommended that you name the file
multnorm.sas
.
Download and save multnorm.sas
Type: | Sample |
Topic: | Analytics ==> Exploratory Data Analysis Analytics ==> Multivariate Analysis Analytics ==> Descriptive Statistics SAS Reference ==> Macro Analytics ==> Analysis of Variance |
Date Modified: | 2022-08-24 10:20:37 |
Date Created: | 2005-01-13 15:02:42 |
Product Family | Product | Host | SAS Release | |
Starting | Ending | |||
SAS System | SAS/STAT | All | n/a | n/a |