Special SAS Data Sets


Introduction to Special SAS Data Sets

All SAS/STAT procedures create SAS data sets. Any table generated by a procedure can be saved to a data set by using the Output Delivery System (ODS), and many procedures also have syntax that enables you to save other statistics to data sets. Some of these data sets are organized according to certain conventions so that they can be read by a SAS/STAT procedure for further analysis. Such specially organized data sets are recognized by the TYPE= data set attribute.

The CORR procedure (see the Base SAS Procedures Guide: Statistical Procedures), for example, can create a data set with the attribute TYPE=CORR containing a correlation matrix. This TYPE=CORR data set can be read by the REG or FACTOR procedure, among others. If the original data set is large, using a special SAS data set in this way can save computer time by avoiding the recomputation of the correlation matrix in subsequent analyses.

PROC REG, for example, can create a TYPE=EST data set containing estimated regression coefficients. If you need to make predictions for new observations, you can use the SCORE procedure to read both the TYPE=EST data set and a data set containing the new observations. PROC SCORE can then compute predicted values or residuals without repeating the entire regression analysis. See Chapter 100: The SCORE Procedure, for an example.

A special SAS data set might contain different kinds of statistics. A special variable called _TYPE_ is used to distinguish the various statistics. For example, in a TYPE=CORR data set, an observation in which _TYPE_=’MEAN’ contains the means of the variables in the analysis, and an observation in which _TYPE_=’STD’ contains the standard deviations. Correlations appear in observations with _TYPE_=’CORR’. Another special variable, _NAME_, is needed to identify the row of the correlation matrix. Thus, the correlation between variables X and Y is given by the value of the variable X in the observation for which _TYPE_=’CORR’ and _NAME_=’Y’, or by the value of the variable Y in the observation for which _TYPE_=’CORR’ and _NAME_=’X’.

The special data sets created by SAS/STAT procedures can generally be used directly by other procedures without modification. However, if you create an output data set with PROC CORR and use the NOCORR option to omit the correlation matrix from the OUT= data set, you need to set the TYPE= option either in parentheses following the OUT= data set name in the PROC CORR statement or in parentheses following the DATA= option in any other procedure that recognizes the special TYPE= attribute. In either case, the TYPE= option should be set to COV, CSSCP, or SSCP according to what type of matrix is stored in the data set and what data set types are accepted as input by the other procedures you plan to use. If you do not follow these steps and you use the TYPE=CORR data set with no correlation matrix as input to another procedure, the procedure might issue an error message indicating that the correlation matrix is missing from the data set.

You can create special SAS data sets directly in a DATA step by specifying the TYPE= option in parentheses after the data set name in the DATA statement. See Example A.2: Creating a TYPE=CORR Data Set in a DATA Step for an example. If you use a DATA step with a SET statement to modify a special SAS data set, you must specify the TYPE= option in the DATA statement. The TYPE= attribute of the data set in the SET statement is not automatically copied to the data set being created. You can determine the TYPE= attribute of a data set by using the CONTENTS procedure (see Example A.1: A TYPE=CORR Data Set Produced by PROC CORR and the Base SAS Procedures Guide for details).

Table A.1 summarizes the TYPE= data sets that can be used as input to SAS/STAT procedures. Table A.2 summarizes the TYPE= data sets that are created by SAS/STAT procedures and the statements each procedure uses to create its special output data sets. Most procedures accept ordinary SAS data sets and create ordinary output SAS data sets with no TYPE= specification in addition to the special data sets shown in the tables. When you specify a data set with a type that the procedure does not recognize, the procedure prints an error message and stops executing.

Table A.1: SAS/STAT Procedures That Accept Special Input Data Sets Types

Procedure

Special TYPE= Data Sets Accepted

ACECLUS

ACE, CORR, COV, SSCP, UCORR, UCOV

BOXPLOT

BOXPLOT, CHARTSUM

CALIS

CALISMDL, CORR, COV, FACTOR, SSCP, UCORR, UCOV, WEIGHT

CANDISC

CORR, COV, SSCP, CSSCP

CATMOD

EST

CLUSTER

DISTANCE

DISCRIM

CORR, COV, SSCP, CSSCP, LINEAR, QUAD, MIXED

FACTOR

ACE, CORR, COV, FACTOR, SSCP, UCORR, UCOV

LIFEREG

EST

LOGISTIC

EST LOGISMOD

MI

EST, COV, CORR

MIANALYZE

EST, COV, CORR

MODECLUS

DISTANCE

PHREG

EST

PRINCOMP

ACE, CORR, COV, EST, FACTOR, SSCP, UCORR, UCOV

PROBIT

EST

QUANTREG

EST

REG

CORR, COV, SSCP, UCORR, UCOV

ROBUSTREG

EST

SCORE

SCORE= data set can be of any type

SIMNORM

CORR, COV

SURVEYLOGISTIC

EST

STEPDISC

CORR, COV, SSCP, CSSCP

TREE

TREE

VARCLUS

CORR, COV, FACTOR, SSCP, UCORR, UCOV


Table A.2: SAS/STAT Procedures That Create Special Output Data Set Types

Procedure

TYPE=

Statement and Option Required

ACECLUS

ACE

PROC ACECLUS OUTSTAT=

BOXPLOT

BOXPLOT
CHARTSUM

PLOT / OUTBOX=
PLOT / OUTHISTORY=

CALIS

CALISFIT
CALISMDL
CORR
COV
EST
WEIGHT

PROC CALIS OUTFIT=
PROC CALIS OUTMODEL=
PROC CALIS CORR OUTSTAT=
PROC CALIS OUTSTAT=
PROC CALIS OUTEST=
PROC CALIS OUTWGT=

CANCORR

CORR
UCORR

PROC CANCORR OUTSTAT=
PROC CANCORR NOINT OUTSTAT=

CANDISC

CORR

PROC CANDISC OUTSTAT=

CATMOD

EST

RESPONSE / OUTEST=

CLUSTER

TREE

PROC CLUSTER OUTTREE=

DISCRIM

LINEAR
QUAD
MIXED
CORR

PROC DISCRIM POOL=YES OUTSTAT=
PROC DISCRIM POOL=NO OUTSTAT=
PROC DISCRIM POOL=TEST OUTSTAT=
PROC DISCRIM METHOD=NPAR OUTSTAT=

DISTANCE

DISTANCE
SIMILAR

PROC DISTANCE METHOD=distance-method OUT=
PROC DISTANCE METHOD=similarity-method OUT=

FACTOR

FACTOR

PROC FACTOR OUTSTAT=

LIFEREG

EST

PROC LIFEREG OUTEST=

LOGISTIC

EST
LOGISMOD

PROC LOGISTIC OUTEST=
PROC LOGISTIC OUTMODEL=

MI

COV
COV
COV
EST

EM OUTEM=
EM OUTITER=
MCMC OUTITER=
MCMC OUTEST=

NLIN

EST

PROC NLIN OUTEST=

ORTHOREG

EST

PROC ORTHOREG OUTEST=

PHREG

EST

PROC PHREG OUTEST=

PRINCOMP

CORR
COV
UCORR
UCOV

PROC PRINCOMP OUTSTAT=
PROC PRINCOMP COV OUTSTAT=
PROC PRINCOMP NOINT OUTSTAT=
PROC PRINCOMP NOINT COV OUTSTAT=

PROBIT

EST

PROC PROBIT OUTEST=

QUANTREG

EST

PROC QUANTREG OUTEST=

REG

EST
SSCP

PROC REG OUTEST=
PROC REG OUTSSCP=

ROBUSTREG

EST

PROC ROBUSTREG OUTEST=

VARCLUS

CORR
UCORR
TREE

PROC VARCLUS OUTSTAT=
PROC VARCLUS NOINT OUTSTAT=
PROC VARCLUS OUTTREE=