SAS/STAT software includes numerous enhancements in Release 8.1. The following are some of the highlights of this release for SAS/STAT software.
Multiple imputation provides a useful strategy for dealing with data sets with missing values. Instead of filling in a single value for each missing value, Rubin's (1987) multiple imputation procedure replaces each missing value with a set of plausible values that represent the uncertainty about the right value to impute. These multiply imputed data sets are then analyzed by using standard procedures for complete data and combining the results from these analyses. This results in statistically valid inferences that properly reflect the uncertainty due to missing values.
SAS/STAT software, Version 8, introduces the experimental MI and MIANALYZE procedures for creating and analyzing multiply imputed data sets for incomplete multivariate data. The MI procedure is a multiple imputation procedure that creates multiply imputed data sets for incomplete p-dimensional multivariate data. It uses methods that incorporate appropriate variability across m imputations. Once the m complete data sets are analyzed using standard SAS/STAT procedures, PROC MIANALYZE can be used to generate valid statistical inferences about these parameters by combining the results. The MI procedure provides three methods for imputing missing values and the method of choice depends on the type of missing data pattern. For monotone missing data patterns, either a parametric regression method that assumes multivariate normality or a nonparametric method that uses propensity scores is appropriate. For an arbitrary missing data pattern, a Markov Chain Monte Carlo (MCMC) method that assumes multivariate normality can be used.
The experimental GAM procedure fits generalized additive models as defined by Hastie and Tibshirani (1990), providing a set of powerful tools for data analysis that are based on nonparametric and smoothing techniques. These techniques relax the usual linearity assumptions and allow you to discern structure in the relationship between independent variables and dependent variables that you might otherwise not find.
The generalized additive models combine an additive assumption that allows many nonparametric relationships to be explored simultaneously plus provides the distributional flexibility of the generalized linear model. You can use the GAM procedure when you have multiple independent variables that you want to model nonparametrically, or when the dependent variable is not distributed normally. The GAM procedure provides nonparametric estimates for additive models, supports the use of multidimensional data, fits both generalized semiparametric additive models and generalized additive models, and enables you to choose a model by specifying model degrees of freedom or smoothing parameter.
Usually, the asymptotic requirements for fitting logistic regression models are easily met. However, there are situations in which small sample sizes or sparse or skewed data make these assumptions unrealistic. In these cases, you can apply exact logistic regression methods to model your data. In Release 8.1, you can perform exact logistic regression for binary outcomes with the new EXACT statement in the LOGISTIC procedure. Inference is based on determining the exact distributions of sufficient statistics for parameters of interest in a logistic regression model, conditional on the remaining parameters. However, this is numerically infeasible for anything but the most basic problems, so exact logistic regression became practical only when Hirji, Mehta, and Patel (1987) developed an efficient algorithm for generating the required conditional distributions.
The exact capabilities in the LOGISTIC procedure include the exact probability test and the exact conditional scores test for testing the hypothesis that the parameters for the specified effect are zero.
Joint tests are also available. The procedure provides parameter estimates for each variable listed in the EXACT statement conditional on the values of all the other variables in the model. For each parameter estimate, the procedure displays either the exact maximum conditional likelihood estimate or the median unbiased estimate, the exponential of the estimate, one- or two-sided confidence limits, and a one- or two-sided p-value for testing that the parameter is equal to zero.
The following are some of the key enhancements to existing procedures in SAS/STAT software in Release 8.1.
The CATMOD procedure now supports the iterative proportional fitting (IPF) algorithm for fitting hierarchical log-linear models with no independent variables and no population variables. The advantage of the IPF algorithm is that you can obtain the log likelihood, G2 and the predicted cell counts without performing expensive parameter estimation and covariance computations.
The generalized Crawford-Ferguson family of orthogonal and oblique rotations is now available with the FACTOR procedure. When METHOD=ML (maximum likelihood), the new SE option requests standard error estimates for the unrotated factor loadings, the rotated factor loadings, the factor correlations, and the factor structure loadings. The standard error estimates are available for the generalized Crawford-Ferguson family of rotations but not for the promax and Harris-Kaiser rotations.
The LOESS procedure now does automatic smoothing parameter selection by default. You can choose the criterion used from AICC, AICC1, or GCV, and PROC LOESS uses a smoothing parameter value that yields a local minimum of the specified criterion. PROC LOESS now also supports higher-order blending methods.
Two new p-value adjustment methods are available: Fisher combination and Hommel. You can obtain these adjustments by specifying the FISHER_C and HOMMEL options, respectively, in the PROC MULTTEST statement. The FISHER_C option requests adjusted p-values using closed tests, based on the idea of Fisher's combination test. Hommel's (1988) method is a closed testing procedure that is based on Simes' (1986) test.
The new WEIGHT= option for the STRATA statement specifies the type of strata weighting to use when computing the Freeman-Tukey and t-tests for the mean. Valid values for the WEIGHT= option in the STRATA statement are SAMPLESIZE, HARMONIC, and EQUAL.
The new COVSANDWICH option in the PROC PHREG statement produces the Lin and Wei (1989) robust sandwich estimate of the covariance matrix. The specification COVSANDWICH(AGGREGATE) enables you to sum the score residuals within each ID pattern in the computation of this covariance estimate. The cumulative hazard (intensity) function is also available when there are time-dependent explanatory variables.
Domain (sub-group) analysis is now available with the SURVEYMEANS procedure.
Statistics and Operations Research Home Page | What's New in Data Analysis