What's New |
SAS 9.2 brings many new procedures and new enhancements to existing procedures to SAS/STAT software.
Prior to SAS 9.1, creating graphics with statistical procedures generally required additional programming. SAS 9.1 introduced an experimental extension to the Output Delivery System (ODS), which was used by over two dozen SAS/STAT and SAS/ETS procedures to create statistical graphics as automatically as they create tables. This new functionality, referred to as ODS Statistical Graphics (or ODS Graphics for short), requires minimal additional syntax, and it provides displays commonly needed for data analysis and statistical modeling, including scatter plots, histograms, and box-and-whisker plots.
In SAS 9.2, ODS Graphics is production, and over 50 procedures in SAS/STAT, SAS/ETS, SAS/QC, and Base SAS have been modified to use it. Many new plots are now produced by these procedures, either by default or with the specification of procedure options.
The functionality of ODS Graphics has been extended with the addition of new graph types, ODS styles designed for statistical work, and a point-and-click editor for enhancing titles, labels, and other graph features. You can also modify graphs by changing their underlying templates, which are supplied by SAS and are written in the Graph Template Language (GTL). The LISTING destination is now supported by ODS Graphics. A new family of SAS/GRAPH procedures uses ODS Graphics to create standalone plots, such as scatterplots overlaid with smoothers, which are particularly useful for exploratory data analysis. The new SGRENDER procedure provides a way to create customized displays by writing your own templates with the GTL.
See Chapter 21, Statistical Graphics Using ODS, for an introduction to ODS Graphics and general information about ODS Graphics. The syntax for requesting plots with statistical procedures is described in the procedure chapters.
Note that a SAS/GRAPH license is now required to use ODS Graphics.
SAS/STAT users will be interested in SAS/IML® Studio, formerly known as SAS®Stat Studio, which is new software for data exploration and analysis. SAS/IML Studio provides a highly flexible programming environment in which you can run SAS/STAT or SAS/IML® analyses and display the results with dynamically linked graphics and data tables. SAS/IML Studio is intended for data analysts who write SAS programs to solve statistical problems but need more versatility for data exploration and model building. The programming language in SAS/IML Studio, which is called IMLPlus, is an enhanced version of the SAS/IML programming language. IMLPlus extends SAS/IML to provide new language features, including the ability to create and manipulate statistical graphics, call SAS procedures as functions, and call computational programs written in C, C++, Java, and Fortran. SAS/IML Studio runs on a PC in the Microsoft Windows operating environment.
SAS/IML Studio also includes an experimental interface to the R language. The IMLPlus language includes functions that transfer data between SAS data sets and R data frames, and between SAS/IML matrices and R matrices.
SAS/IML Studio is also the successor to the SAS/INSIGHT® product and provides the same interactive functionality. It is distributed with the SAS/IML product. For more information about SAS/IML Studio, see the SAS/IML Studio User's Guide and SAS/IML Studio for SAS/STAT Users.
SAS 9.2 brings a number of new procedures to SAS/STAT software. Several of these procedures have been previously available as Web downloads for SAS 9.1.3: GLIMMIX, GLMSELECT, and QUANTREG. The GLMSELECT procedure performs effect selection in the framework of general linear models. The QUANTREG procedure performs quantile regression. The GLIMMIX procedure analyzes generalized linear mixed models. All of these procedures are production with SAS 9.2 and are available on all platforms.
In addition, Bayesian capabilities were introduced to three procedures via Web downloads for SAS 9.1.3. The BGENMOD, BLIFEREG, and BPHREG procedures were experimental versions of the GENMOD, LIFEREG, and PHREG procedures that used the Gibbs sampler to produce posterior distributions while also providing trace plots and convergence diagnostics. These capabilities have been rolled into the GENMOD, LIFEREG, and PHREG procedures for SAS 9.2 and are now production software.
The MCMC, SEQDESIGN, and SEQTEST procedures were introduced as experimental procedures in SAS 9.2, and they became production in SAS 9.2M2.
The MCMC procedure is a general purpose Markov chain Monte Carlo (MCMC) simulation procedure that is designed to fit a variety of Bayesian models. You specify a likelihood function for the data and a prior distribution for the parameters. PROC MCMC obtains samples from the corresponding posterior distributions, produces summary and diagnostic statistics, and saves the posterior samples in an output data set.
The SEQDESIGN and SEQTEST procedures are tools for group sequential analysis. The SEQDESIGN procedure designs interim analyses for clinical trials, and the SEQTEST procedure performs interim analyses.
The experimental HPMIXED procedure uses a number of specialized high-performance techniques to fit linear mixed models with variance component structure. The HPMIXED procedure is specifically designed to cope with estimation problems that involve a large number of fixed effects, a large number of random effects, or a large number of observations. The models supported by the HPMIXED procedure are a subset of the models that you can fit with the MIXED procedure, and PROC HPMIXED can provide substantial performance improvements in terms of memory requirements and computational speed.
The experimental TCALIS procedure updates the CALIS procedure for structural equation modeling. It will become the CALIS procedure in the next release of SAS/STAT software.
The Power and Sample Size application (PSS), previously available as a Web application, has been rewritten as a Java client. Its documentation is now included here; see Chapter 69, The Power and Sample Size Application.
In addition, over two hundred enhancements have been added to existing procedure in SAS/STAT. For example,
The TTEST procedure provides simple crossover analysis as well as equivalence tests.
Jackknife and BRR variance estimation and domain analysis are now provided by all of the survey data analysis procedures.
The POWER procedure now provides power for a number of additional analyses.
The GENMOD procedure fits zero-inflated Poisson regression models. PROC GENMOD also provides deletion and diagnostics statistics for its GEE models and provides graphics for these statistics.
The PHREG procedure adds a HAZARDRATIO statement for computing hazard ratios, including hazard ratios in the presence of interactions.
The GLIMMIX procedure introduces the COVTEST statement for inference about covariance parameters. In addition, PROC GLIMMIX provides new estimation methods: Laplace and adaptive quadrature.
An experimental EFFECT statement can be found in the GLIMMIX, GLMSELECT, and QUANTREG procedures. It enables you to construct special collections of columns for design matrices (for example, splines and multimember effects).
Finally, note that this documentation contains several new introductory chapters. See Chapter 3, Introduction to Statistical Modeling with SAS/STAT Software, Chapter 6, Introduction to Mixed Modeling Procedures, Chapter 19, Introduction to Power and Sample Size Analysis, and Chapter 18, Shared Concepts and Topics.
More information about the changes and enhancements follow. The details can be found in the documentation for the individual procedures.
Standardized root mean square residuals (SRMSR) are now listed in the fit summary table, and PROC CALIS now offers residual plots. See below for more information on the TCALIS procedure.
The PLOTS option in the PROC CLUSTER statement produces plots of the cubic clustering criterion (CCC), the pseudo F (PSF) statistic, and the pseudo (PST2) statistic, all plotted against the number of clusters.
The correspondence analysis plot is produced by default when ODS Graphics is enabled.
You can produce a number of graphs with the PLOTS= option in the PROC FACTOR statement. These include various factor pattern plots, reference structures, and scree and variance explained plots. You can now use the OUT= option in conjunction with a PARTIAL statement. The PARPREFIX= option in the PROC statement specifies the prefix for the residual variables in the output data sets.
The FREQ procedure can now produce frequency plots, cumulative frequency plots, deviation plots, odds ratio plots, and kappa plots. You can now request equivalence and noninferiority tests for the binomial proportion and proportion difference. New confidence limits for the binomial proportion (such as Agresti-Coull, Jeffreys, and Wilson) are now available, as well as unconditional exact confidence limits for the proportion difference. You can request Zelen's exact test for equal odds ratios by specifying the EQOR option in the EXACT statement.
The GAM procedure is production with SAS 9.2. PROC GAM now produces graphs, including smoothing component plots and additive component plots. The target for an additive logistic model no longer has to be numeric; PROC GAM offers the same types of options for response and classification variables that are available in procedures such as PROC LOGISTIC and PROC GENMOD. The ANODEV=NOREFIT option in the MODEL statement enables a fast approximation analysis of deviance.
The BAYES statement produces Bayesian analysis via Gibbs sampling for most of the statistical analyses provided by the GENMOD procedure. This release also includes deletion diagnostics and plots for GLMs and GEEs, zero-inflated Poisson regression models, and AIC and QIC model fit statistics. Martingale residuals are now production. The LSMEANS statement now produces inverse link estimates.
The GLIMMIX procedure fits statistical models to data with correlations or nonconstant variability and where the response is not necessarily normally distributed. These generalized linear mixed models (GLMM), like linear mixed models, assume normal (Gaussian) random effects. Conditional on these random effects, data can have any distribution in the exponential family. The binary, binomial, Poisson, and negative binomial distributions, for example, are discrete members of this family. The normal, beta, gamma, and chi-square distributions are representatives of the continuous distributions in this family. The GLIMMIX procedure was first made available for SAS 9.1.3 as a Web download.
In SAS 9.2, the GLIMMIX procedure provides Laplace and adaptive quadrature estimation methods, and, with them, a likelihood-based empirical estimator. In addition, a new bias-corrected estimator is available. The experimental EFFECT statement provides for the creation of splines as well as other special effects. The COVTEST statement enables likelihood-based inference about the covariance parameters. A number of additional covariance structures have been added, including heterogeneous AR(1), heterogeneous compound symmetry, linear structures, heterogeneous Toeplitz, penalized B-spline, spatial anisotropic, and the Matérn covariance structure. Step-down multiplicity adjustments are now supported for all ADJUST= methods in the LSMEANS, ESTIMATE, and LSMESTIMATE statements, except for ADJUST=NELSON in the LSMEANS statement.
The DDFM=KR(FIRSTRORDER) option drops the second-derivative term in the KR calculations. The OUTDESIGN= option in the PROC GLIMMIX statement enables you to write the and matrix to an output data set. New graphics include boxplots of data and/or residuals with respect to classification effects as well as plots of odds ratios and their confidence limits. The diffogram, meanplot, anomplot, and controlplot have been enhanced.
New graphics are now produced for means and for LS-means comparisons. The experimental EFFECTSIZE option in the MODEL statement adds measures of effect size to many analysis of variance tables. The PLOTS=DIAGNOSTICS and the PLOTS=RESIDUAL options in the PROC GLM statement produce summary diagnostics and residual plots, respectively.
The new ORDER= option in the PROC GLMPOWER statement specifies the sorting order for the levels of all of the classification variables specified in the CLASS statement. Continuous variables are now supported, and the noncentrality parameter is computed.
The GLMSELECT procedure performs effect selection in the framework of general linear models. A variety of model selection methods are available, including the LASSO method of Tibshirani (1996) and the related LAR method of Efron et al. (2004). The procedure offers extensive capabilities for customizing the selection with a wide variety of selection and stopping criteria, from traditional and computationally efficient significance-level-based criteria to more computationally intensive validation-based criteria. The procedure also provides graphical summaries of the selection search.
Enhancements in SAS 9.2 include an OUTDESIGN= option to obtain the design matrix, a PARMLABELSTYLE= option to control the style of the parameter labels, and an experimental EFFECT statement that you can use to create splines, polynomials, multimember, and collection effects.
The experimental HPMIXED procedure uses a number of specialized high-performance techniques to fit linear mixed models with variance component structure. The HPMIXED procedure is specifically designed to cope with estimation problems that involve a large number of fixed effects, a large number of random effects, or a large number of observations. While the HPMIXED procedure fits only a subset of the models fit by the MIXED procedure and it does not provide the breadth of confirmatory inference that is available with the MIXED procedure, it can have considerably better performance in terms of memory requirements and computational speed.
ODS Graphics has been added to the KRIGE2D procedure, which now can produce scatter plots and prediction plots.
The BAYES statement provides Bayesian analysis via Gibbs sampling.
The LIFETEST procedure now produces the Nelson-Aalen estimates of the cumulative hazard function. The number of subjects at risk can be displayed for the Kaplan-Meier survival curves. Comparison methods are available for the -sample test, and you can now specify a smoother hazard function using the kernel method.
The LOGISTIC procedure performs Firth's penalized maximum likelihood. The MULTIPASS option forces the procedure to reread the input data set as needed rather than requiring its storage in memory or in a temporary file on disk. Estimated culmulative probabilities have been added to the SCORE statement output. The CONTRAST statement now includes the inverse link. The ROCCONTRAST statement compares different ROC models. Odds ratios in the presence of interactions are now computed, and odds ratio plots are provided. Note that the GRAPHICS statement has been replaced with the PLOTS= option in the PROC statement. The EFFECT plot can now handle multiple CLASS and continuous variables. Standard errors are now produced for the exact parameter estimates.
The LOESS procedure includes a PRESEARCH option that uses a preliminary grid search to improve the chance of finding a global optimum of the selection criterion when a golden section search is used.
The %POWTABLE macro renders the output of the POWER and GLMPOWER procedures in rectangular form, and it optionally produces simplified results by using weighted means across chosen variables. The %ModStyle macro modifies the colors, line styles, and marker symbols displayed in ODS Graphics plots.
The MCMC procedure is a flexible simulation-based procedure that is suitable for fitting a wide range of Bayesian models. To use the procedure, you need to specify a likelihood function for the data and a prior distribution for the parameters. You might also need to specify hyperprior distributions if you are fitting hierarchical models. PROC MCMC then obtains samples from the corresponding posterior distributions, produces summary and diagnostic statistics, and saves the posterior samples in an output data set that can be used for further analysis. You can analyze data that have any likelihood, prior, or hyperprior with PROC MCMC, as long as these functions are programmable using the SAS DATA step functions. The parameters can enter the model linearly or in any nonlinear functional form. The default algorithm that PROC MCMC uses is an adaptive blocked random-walk Metropolis algorithm that uses a normal proposal distribution.
ODS Graphics has been added to the MDS procedure, which now can produce fit plots, coefficient plots, and configuration plots.
The RESIDUAL and INFLUENCE options in the MODEL statement are now production. The PLOTS= option in the PROC MIXED statement is now available to specify graphics.
The MULTTEST procedure now provides the adaptive Holm, adaptive Hochberg, adaptive FDR, bootstrap FDR, pFDR, and permutation FDR -value adjustments. ODS Graphics has been added to PROC MULTTEST, and plots of adjusted -value plots, raw -values by rank and histogram, and -values by test are now available. Satterthwaite degrees of freedom are now provided for the test. The EPSILON= option in the PROC MULTTEST statement specifies the comparison value.
The ALPHA= option has been added to the PROC NLIN and OUTPUT statements. The PDATA= option in the PARAMETERS statement enables you to assign starting values for parameters through a SAS data set. The DER option in the OUTPUT statement saves the first derivatives of the model with respect to the parameters to the OUTPUT data set.
The EMPIRICAL option in the PROC NLMIXED statement requests that the covariance matrix of the parameter estimates be computed as a likelihood-based empirical ("sandwich") estimator (White 1982). Subject-specific gradients can be added to a SAS data set with the SUBGRADIENT option in the PROC NLMIXED statement.
ODS Graphics has been added to the NPAR1WAY procedure, and you can request boxplots, a median plot, and an empirical distribution plot with the PLOTS= option in the PROC NPAR1WAY statement. PROC NPAR1WAY now computes the Hodges-Lehmann estimate of location shift for two-sample data with the HL option. Confidence limits are provided, and you can request exact confidence limits by specifying the HL option in the EXACT statement. Tests based on Conover scores are now available, including exact tests.
The CLASS statement, previously available only in the TPHREG procedure, is now included with the PHREG procedure. The BAYES statement provides Bayesian analysis via Gibbs sampling. PROC PHREG now fits the piecewise exponential model, which is specified in the BAYES statement. Bayesian baseline survival prediction becomes available with SAS 9.2 as well. The HAZARDRATIO statement provides a new facility for computing hazard ratios, including hazard ratios in the presence of interactions. The PLOTS option in the PROC PHREG statement produces baseline survival function plots. Profile-likelihood confidence limits are now available for hazard ratios produced in classical analyses. Firth's penalized likelihood method is provided as well.
The PLS procedure now produces more graphics, including a correlation loadings plot. The MISSING option for handling missing values with imputation is now production.
The new LOGISTIC statement performs power and sample size analyses for the likelihood ratio chi-square test of a single predictor in binary logistic regression, possibly in the presence of one or more covariates (where all predictors are independent of each other). The new TWOSAMPLEWILCOXON statement performs power and sample size analyses for the Wilcoxon-Mann-Whitney test for two independent groups. The ONESAMPLEFREQ statement now covers equivalence, noninferiority, and confidence interval precision for a proportion. The PAIREDFREQ statement offers new input parameterizations, including raw proportions and correlation.
The PRINCOMP procedure now produces more graphics, including an ellipse plot. It includes an ID statement and incorporates ID variables as tips in its scatter plots. The PARPREFIX= option in the PROC PRINCOMP statement specifies a prefix for naming the residual variables in the OUT= data set and the OUTSTAT= data set.
The PRINQUAL procedure now produces graphs. These include a multidimensional preference analysis plot and a variable transformation plot.
The PROBIT procedure now offers a predicted probability plot.
The PSS application has been converted to a Java client application and no longer requires a Web server. It now offers a relative risk parameterization for the two proportions analysis. New analyses covered include equivalence and noninferiority for proportions, confidence interval for one proportion, Wilcoxon-Mann-Whitney for two distributions, logistic regression, and GLM contrasts for interactions.
Quantile regression extends the regression model to conditional quantiles of the response variable, such as the 90th percentile. Quantile regression is particularly useful when the rate of change in the conditional quantile, expressed by the regression coefficients, depends on the quantile. The main advantage of quantile regression over least squares regression is its flexibility for modeling data with heterogeneous conditional distributions. The QUANTREG procedure was first made available as a Web download for SAS 9.1.3.
With SAS 9.2, the QUANTREG procedure becomes production. In addition, it now includes the experimental EFFECT statement for generating splines and the ability to output results for multiple quantiles in the OUTPUT statement.
The REG procedure now includes a lack-of-fit test. The PARTIAL option in the MODEL statement requests partial regression plots for each regressor; the PARTIALDATA option displays partial regression data. Heteroscedasticity-consistent (White) standard errors are now available, and you can obtain a heteroscedasticity-consistent covariance matrix for use with the ACOV, HCC, or WHITE option in the MODEL statement and for heteroscedasticity-consistent tests with the TEST statement.
ODS Graphics has been added to the RSREG procedure. New graphs include diagnostic plots, ridge plots, and surface plots.
The SEQDESIGN procedure designs interim analyses for clinical trials. PROC SEQDESIGN computes the boundary values and required sample sizes for the trial. The boundary values are derived in such a way that the overall Type I and Type II error probability levels are maintained at the levels specified in the design. Available methods include fixed boundary shape methods (which include unified family methods such as the O'Brien-Fleming method), Whitehead methods, and error spending methods. In addition to the boundary values, the SEQDESIGN procedure computes a variety of quantities such as average sample sizes and stopping probabilities.
The SEQTEST procedure is used in conjunction with the SEQDESIGN procedure to carry out interim analyses for clinical trials. At each stage, you analyze all the data available at that point with a statistical procedure and compute a test statistic and its information level. You then use the SEQTEST procedure to compare the test statistic with the boundary values for that stage. If the information levels of the data do not match the levels specified in the design, the SEQTEST procedure modifies the boundary levels appropriately. In addition, the SEQTEST procedure computes quantities such as average sample sizes, stopping probabilities, and conditional power. At the conclusion of the trial, the SEQTEST procedure computes parameter estimates, p-values, and confidence limits.
ODS Graphics has been added to the SIM2D procedure. Means plots and scatter plots of the observed data are now available.
The SIMNORMAL procedure becomes production with this release.
The NOTRUNCATE option in the FREQ statement specifies that frequency values are not truncated to integers. Quantile methods now accept noninteger frequencies and handle weights. In order to improve numerical precision, PROC STDIZE now creates double-precision values for output variables instead of inheriting the length of the variables in the analysis.
The SURVEYFREQ procedure now provides variance estimation by balanced repeated replication (BRR) and the jackknife methods, in addition to the Taylor series method. You can provide replicate weights for the new replication methods with a REPWEIGHTS statement, or the procedure can construct the replicate weights. PROC SURVEYFREQ now computes odds ratio and relative risk estimates. The new NOMCAR option in the PROC SURVEYFREQ statement requests a subpopulation analysis of the set of respondents for Taylor series variance estimation.
The SURVEYLOGISTIC procedure now provides variance estimation by balanced repeated replication (BRR) and the jackknife methods, in addition to the Taylor series method. You can provide replicate weights for the new replication methods with a REPWEIGHTS statement, or the procedure can construct the replicate weights. The OUTPUT and DOMAIN statements are now available. The new NOMCAR option in the PROC SURVEYLOGISTIC statement requests a subpopulation analysis of the set of respondents for Taylor series variance estimation.
The SURVEYMEANS procedure now provides variance estimation by balanced repeated replication (BRR) and the jackknife methods, in addition to the Taylor series method. You can provide replicate weights for the new replication methods with a REPWEIGHTS statement, or the procedure can construct the replicate weights. The new NOMCAR option in the PROC SURVEYMEANS statement requests a subpopulation analysis of the set of respondents for Taylor series variance estimation. PROC SURVEYMEANS now computes percentiles (Woodruff variance estimation only).
The SURVEYREG procedure now provides variance estimation by balanced repeated replication (BRR) and the jackknife methods, in addition to the Taylor series method. You can provide replicate weights for the new replication methods with a REPWEIGHTS statement, or the procedure can construct the replicate weights. In addition, PROC SURVEYREG also includes a DOMAIN statement, for domain analysis. The OUTPUT statement enables you to produce predicted values and residuals and put them into a SAS data set. The ORDER= option has been added to the PROC SURVEYREG statement. The new NOMCAR option in the PROC SURVEYREG statement requests a subpopulation analysis of the set of respondents for Taylor series variance estimation.
The SURVEYSELECT procedure now provides methods to allocate the total sample size among the strata. Allocation methods include proportional, Neyman, and optimal allocation.
The TCALIS procedure is experimental in SAS 9.2. It enables you to perform the same kind of statistical analyses that you can do with PROC CALIS. In addition, PROC TCALIS provides functionality such as multiple-group analysis, enhanced mean structure analysis, path-like model specification, support of LISREL-type models, customizable effect analysis, general parametric function testing, customizable Lagrange multiplier tests, and so on. Currently, you can specify COSAN models only in PROC CALIS, but not in PROC TCALIS.
The TRANSREG procedure include new options for existing splines to make exterior knot specification easier and more flexible. PROC TRANSREG now includes the penalized B-spline. A number of plots are now produced, including Box-Cox plots, preference mapping, regression residuals, and scatter plots.
The TTEST procedure now performs TOST equivalence analyses, analyses of treatment and period in AB/BA crossover designs, weighted Satterthwaite tests and confidence intervals, analyses of ratios, and one-sided analyses. It supports both normal and lognormal data. Sasabuchi tests and Fieller confidence intervals are computed for normal ratios. PROC TTEST now provides graphs, including histograms, densities, box plots, profiles, agreement plots, Q-Q plots, and interval plots. The new ORDER= option in the PROC TTEST statement specifies the sorting order for the levels of classification variables (specified in the CLASS statement) and crossover treatment variables (specified in the CROSSOVER option in the VAR statement).
The METHOD=GRR option has been added to provide gauge repeatability and reproducibility analysis. The CL option has been added to the MODEL statement to compute confidence limits for all of the parameters of interest. This applies to the balanced one-way or two-way designs for METHOD=TYPE1 or GRR. Autocorrelation statistics and tests are now available.
Autocorrelation statistics are now available. In addition, PROC VARIOGRAM produces graphics, including a scatter plot of the observed data, histogram of the pairwise distance distribution, plots of the empirical classical and robust semivariograms, and panels of the empirical classical and robust semivariogram plots.
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.