SAS/STAT software includes numerous enhancements in Version 8. Beginning with the incorporation of the ODS facility, users will see a variety of changes that make the software even more useful for data analysis. This release also includes several procedures that take the software in new directions. The following are some of the highlights of this release for SAS/STAT software.
For more detail, refer to the paper Recent Enhancements and New Directions in SAS/STAT Software by Maura Stokes and Bob Rodriguez.
The Output Delivery System is now used by all of the statistical procedures to process results. Instead of directly generating list files and output data sets, the procedures now generate an output object for every result that can be displayed. This gives you tremendous flexibility in managing your analytical results. You can render the output in HTML, rich text, Postscript, or PCL formats. You can send results to different destinations such as output directories, and you can change the output style with the new TEMPLATE procedure. In addition, any table or statistic that a procedure produces can be placed into an output data set.
Many researchers use surveys to collect their information, choosing probability-based complex sample designs such as stratified selection, clustering, and unequal weighting. The purpose of these designs is to select samples at the lowest possible cost that can produce estimates that are precise enough for the purposes of the study. New SAS procedures for survey design and analysis enable you to work with data based on complex sampling designs. The SURVEYSELECT procedures provides a variety of methods for selecting probability-based random samples as well as samples according to complex multistage designs that include stratification, clustering, and unequal probabilities of selection. The SURVEYMEANS procedure computes estimates of survey population totals and means, estimates of their variances, confidence limits, and other descriptive statistics. The SURVEYREG procedure performs regression analysis for sample survey data, fitting linear models and computing regression coefficients and the covariance matrix.
The NLMIXED procedure fits nonlinear mixed models, that is, models in which both fixed and random effects enter nonlinearly. These models have a wide variety of applications, for example pharmacokinetics and overdispersed binomial data. PROC NLMIXED enables you to specify a conditional distribution for your data (given the random effects) having either a standard form (normal, binomial, Poisson) or a general distribution that you code using SAS programming statements.
Several procedures for nonparametric density estimation and nonparametric regression are included in this release. The KDE procedure computes nonparametric estimates of univariate and bivariate probability density functions using the method of kernel density estimation. The LOESS procedure implements a nonparametric method for estimating local regression surfaces, and the TPSPLINE procedure uses a penalized least squares method to estimate multivariate regression surfaces with thin-plate smoothing splines.
The goal of the usual least squares regression is to minimize sample response prediction error. An additional goal in partial least squares regression is to account for variation in the predictors. Directions in the predictor space that are well sampled should provide better prediction for new observations when the predictors are highly correlated. The PLS procedure extracts successive linear combinations of the predictors, called factors or components, that optimally explain predictor and response variation. Techniques include principal components regression, reduced rank regression, and partial least squares regression. You can choose the number of extracted factora via cross valiation, which is a process of fitting the model to a part of the data and minimizing prodictor error for the unfitted part.
In fields such as petroleum exploration, mining, and water pollution analysis, data are available at specific spatial locations (such as experimental stations positioned above the ground or at certain distances in the air), and the goal is to predict unsampled locations. The unsampled locations are often mapped on a regular grid, and the predictions are used to produce surface plots or contour maps.
Version 8 includes procedures that correspond to the two steps for spatial prediction for two-dimensional data. The VARIOGRAM procedure computes the sample or empirical measure of spatial continuity (the semivariogram or covariance), which is then used in determining the theoretical semivariogram model by graphical or other means. The KRIGE2D procedure performs ordinary kriging at specified points using the theoretical model. Results are usually displayed by using the GPLOT or G3D procedure or SAS/INSIGHT software.
The GENMOD procedure has been enhanced in several key ways for this release. The procedure now includes an LSMEANS statement that provides an extension of least squares means to the generalized linear model. In addition, an ESTIMATE statement has been added, and the negative binomal distribution is now supported with the new DIST=NEGBIN option in the MODEL statement.
The GEE facilities have also been updated. Type 3 tests are now available for model effects, and the CONTRAST statement now can be used for the GEE parameter estimates. The LSMEANS and ESTIMATE statements are also supported. The method of alternating logistic regression is now available as well as some facility for handling ordinal response data.
Occasionally, you are faced with badly scaled or ill-conditioned data. The ORTHOREG procedure was developed for those cases that the GLM and REG procedures are not designed to handle. PROC ORTHOREG uses the QR algorithm to produce numerically precise estimates. The procedure has been enhanced to accept a CLASS statement and GLM-like model specification so that it can handle a broader range of statistical models. In addition, the output has been enhanced to include additional statistics and model information.
In response to many requests from users, a number of procedures now include additional support for confidence limits. For example, in the GLM procedure you can specify the CLPARM option to request confidence limits for the parameter estimates (with SOLUTION) and results of the ESTIMATE statement. The CLB option in the REG procedure produces confidence limits for the regression estimates it produces. The UNIVARIATE procedure produces confidence limits for a variety of distributional parameters.
The work of providing exact p-values where plausible continues with the Version 8 release. Monte Carlo simulation is now available for computing exact p-values in both the FREQ and NPAR1WAY procedures; it is useful in some situations in which the default exact algorithms are not feasible. The new MAXTIME option in the EXACT statement specifies a quitting time if exact computations are not completed. New nonparametric tests for scale differences in the NPAR1WAY procedure Siegal-Tukey, Ansari-Bradley, Klotz, and Mood include exact p-values, and the new test for binomial proportions in the FREQ procedure also includes exact p-values.
Statistics and Operations Research Home Page | What's New in Data Analysis