The TRANSREG Procedure

OUTPUT Statement

OUTPUT OUT=SAS-data-set <o-options> ;

The OUTPUT statement creates a new SAS data set that contains coefficients, marginal means, and information about the original and transformed variables. The information about original and transformed variables composes the score partition of the data set; observations have _TYPE_=’SCORE’. The coefficients and marginal means compose the coefficient partition of the data set; observations have _TYPE_=’M COEFFI’ or _TYPE_=’MEAN’. Other values of _TYPE_ are possible; for details, see _TYPE_ and _NAME_ Variables later in this chapter. For details about data set structure, see the section Output Data Set. To specify the name of the output data set, use the OUT= option.

OUT=SAS-data-set

specifies the output data set for the data, transformed data, predicted values, residuals, scores, coefficients, and so on. When you use an OUTPUT statement but do not use the OUT= specification, PROC TRANSREG creates a data set and uses the DATAn convention. If you want to create a SAS data set in a permanent library, you must specify a two-level name. For more information about permanent libraries and SAS data sets, see SAS Language Reference: Concepts.

To control the contents of the data set and variable names, use one or more of the o-options. You can also specify these options in the PROC TRANSREG statement.

Output Options (o-options)

Table 97.5 summarizes the options available in the OUTPUT statement. These options include the OUT= option and all of the o-options. Many of the statistics created in the OUTPUT statement are exactly the same as statistics created by PROC REG. More details are given in the sections Predicted and Residual Values, Model Fit and Diagnostic Statistics in Chapter 79: The REG Procedure, and Chapter 4: Introduction to Regression Procedures.

Table 97.5: Options Available in the OUTPUT Statement

Option

Description

Identify output data set

OUT=

Outputs data set

Predicted Values, Residuals, Scores

CANONICAL

Outputs canonical scores

CLI

Outputs individual confidence limits

CLM

Outputs mean confidence limits

DESIGN=

Specifies design matrix coding

DREPLACE

Replaces dependent variables

IREPLACE

Replaces independent variables

LEVERAGE

Outputs leverage

NORESTOREMISSING

Does not restore missing values

NOSCORES

Suppresses output of scores

PREDICTED

Outputs predicted values

REDUNDANCY=

Outputs redundancy variables

REPLACE

Replaces all variables

RESIDUALS

Outputs residuals

Output Data Set Coefficients

COEFFICIENTS

Outputs coefficients

COORDINATES=

Outputs ideal point coordinates

MEANS

Outputs marginal means

MREDUNDANCY

Outputs redundancy analysis coefficients

Output Data Set Variable Name Prefixes

ADPREFIX=

Specifies dependent variable approximations

AIPREFIX=

Specifies independent variable approximations

CDPREFIX=

Specifies canonical dependent variables

CILPREFIX=

Specifies conservative individual lower CL

CIPREFIX=

Specifies canonical independent variables

CIUPREFIX=

Specifies conservative-individual-upper CL

CMLPREFIX=

Specifies conservative-mean-lower CL

CMUPREFIX=

Specifies conservative-mean-upper CL

DEPENDENT=

Specifies METHOD=MORALS untransformed dependent

LILPREFIX=

Specifies liberal-individual-lower CL

LIUPREFIX=

Specifies liberal-individual-upper CL

LMLPREFIX=

Specifies liberal-mean-lower CL

LMUPREFIX=

Specifies liberal-mean-upper CL

RDPREFIX=

Specifies residuals

PPREFIX=

Specifies predicted values

RPREFIX=

Specifies redundancy variables

TDPREFIX=

Specifies transformed dependents

TIPREFIX=

Specifies transformed independents

Macros Variables

MACRO

Creates macro variables

Other Options

APPROXIMATIONS

Outputs dependent and independent approximations

CCC

Outputs canonical correlation coefficients

CEC

Outputs canonical elliptical point coordinates

CPC

Outputs canonical point coordinates

CQC

Outputs canonical quadratic point coordinates

DAPPROXIMATIONS

Outputs approximations to transformed dependents

IAPPROXIMATIONS

Outputs approximations to transformed independents

MEC

Outputs elliptical point coordinates

MPC

Outputs point coordinates

MQC

Outputs quadratic point coordinates

MRC

Outputs multiple regression coefficients


For the coefficients partition, the COEFFICIENTS, COORDINATES, and MEANS o-options provide the coefficients that are appropriate for your model. For more explicit control of the coefficient partition, use the options that control details and prefixes. The following list provides details about these options.

ADPREFIX=name
ADP=name

specifies a prefix for naming the dependent variable predicted values. The default is ADPREFIX=P when you specify the PREDICTED o-option; otherwise, it is ADPREFIX=A. When you specify the ADPREFIX= o-option, the PREDICTED o-option is automatically specified for you. The ADPREFIX= o-option is the same as the PPREFIX= o-option.

AIPREFIX=name
AIP=name

specifies a prefix for naming the independent variable approximations. The default is AIPREFIX=A. When you specify the AIPREFIX= o-option, the IAPPROXIMATIONS o-option is automatically specified for you.

APPROXIMATIONS
APPROX
APP

is equivalent to specifying both the DAPPROXIMATIONS and the IAPPROXIMATIONS o-options. If you specify METHOD=UNIVARIATE, then the APPROXIMATIONS o-option specifies only the DAPPROXIMATIONS o-option.

CANONICAL
CAN

outputs canonical variables to the OUT= data set. When you specify METHOD=CANALS, the CANONICAL o-option is automatically specified for you. The CDPREFIX= o-option specifies a prefix for naming the dependent canonical variables (default Cand), and the CIPREFIX= o-option specifies a prefix for naming the independent canonical variables (default Cani).

CCC

outputs canonical correlation coefficients to the OUT= data set.

CDPREFIX=name
CDP=name

provides a prefix for naming the canonical dependent variables. The default is CDPREFIX=Cand. When you specify the CDPREFIX= o-option, the CANONICAL o-option is automatically specified for you.

CEC

outputs canonical elliptical point model coordinates to the OUT= data set.

CILPREFIX=name
CIL=name

specifies a prefix for naming the conservative-individual-lower confidence limits. The default prefix is CIL. When you specify the CILPREFIX= o-option, the CLI o-option is automatically specified for you.

CIPREFIX=name
CIP=name

provides a prefix for naming the canonical independent variables. The default is CIPREFIX=Cani. When you specify the CIPREFIX= o-option, the CANONICAL o-option is automatically specified for you.

CIUPREFIX=name
CIU=name

specifies a prefix for naming the conservative-individual-upper confidence limits. The default prefix is CIU. When you specify the CIUPREFIX= o-option, the CLI o-option is automatically specified for you.

CLI

outputs individual confidence limits to the OUT= data set. The names of the confidence limits variables are constructed from the original dependent variable names and the prefixes specified in the following o-options: LILPREFIX= (default LIL for liberal individual lower), CILPREFIX= (default CIL for conservative individual lower), LIUPREFIX= (default LIU for liberal individual upper), and CIUPREFIX= (default CIU for conservative individual upper). When there are no monotonicity constraints, the liberal and conservative limits are the same.

CLM

outputs mean confidence limits to the OUT= data set. The names of the confidence limits variables are constructed from the original dependent variable names and the prefixes specified in the following o-options: LMLPREFIX= (default LML for liberal mean lower), CMLPREFIX= (default CML for conservative mean lower), LMUPREFIX= (default LMU for liberal mean upper), and CMUPREFIX= (default CMU for conservative mean upper). When there are no monotonicity constraints, the liberal and conservative limits are the same.

CMLPREFIX=name
CML=name

specifies a prefix for naming the conservative-mean-lower confidence limits. The default prefix is CML. When you specify the CMLPREFIX= o-option, the CLM o-option is automatically specified for you.

CMUPREFIX=name
CMU=name

specifies a prefix for naming the conservative-mean-upper confidence limits. The default prefix is CMU. When you specify the CMUPREFIX= o-option, the CLM o-option is automatically specified for you.

COEFFICIENTS
COE

outputs either multiple regression coefficients or raw canonical coefficients to the OUT= data set. If you specify METHOD=CANALS (in the MODEL or PROC TRANSREG statement), then the COEFFICIENTS o-option outputs the first n canonical variables, where n is the value of the NCAN= a-option (specified in the MODEL or PROC TRANSREG statement). Otherwise, the COEFFICIENTS o-option includes multiple regression coefficients in the OUT= data set. In addition, when you specify the CLASS expansion for any independent variable, the COEFFICIENTS o-option also outputs marginal means.

COORDINATES<=n>
COO<=n>

outputs either ideal point or vector model coordinates for preference mapping to the OUT= data set. When METHOD=CANALS, these coordinates are computed from canonical coefficients; otherwise, the coordinates are computed from multiple regression coefficients. For details, see the section Point Models.

When ODS Graphics is enabled and vector model coordinates are requested, a plot is produced with points for each row and vectors for each column. If the vectors are plotted based on the actual computed coordinates, then often the vectors are short. A better graphical display is produced when the vectors are stretched. The absolute lengths of each vector can optionally be changed by specifying COORDINATES=n. Then the vector coordinates are all multiplied by n. Usually, n is a value such as 2, 2.5, or 3. The default is 2.5. Specify COORDINATES=1 if you want to see the vectors without any stretching. The relative lengths of the different vectors are important and interpretable, and these are preserved by the stretching.

CPC

outputs canonical point model coordinates to the OUT= data set.

CQC

outputs canonical quadratic point model coordinates to the OUT= data set.

DAPPROXIMATIONS
DAP

outputs the approximations of the transformed dependent variables to the OUT= data set. These are the target values for the optimal transformations. With METHOD=UNIVARIATE and METHOD=MORALS, the dependent variable approximations are the ordinary predicted values from the linear model. The names of the approximation variables are constructed from the ADPREFIX= o-option (default A) and the original dependent variable names. For ordinary predicted values, use the PREDICTED o-option instead of the DAPPROXIMATIONS o-option, since the PREDICTED o-option uses a more relevant prefix (P instead of A) and a more relevant variable label suffix (Predicted Values instead of Approximations).

DESIGN<=n>
DES<=n>

specifies that your primary goal is design matrix coding, not analysis. Specifying the DESIGN o-option makes the procedure run faster. The DESIGN o-option sets the default method to UNIVARIATE and the default MAXITER= value to zero. It suppresses computing the regression coefficients, unless they are needed for some other option. Furthermore, when the DESIGN o-option is specified, the MODEL statement is not required to have an equal sign. When no MODEL statement equal sign is specified, all variables are considered independent variables, all options that require dependent variables are ignored, and the IREPLACE o-option is automatically specified for you.

You can use DESIGN=n for coding very large data sets, where n is the number of observations to code at one time. For example, to code a data set with a large number of observations, you can specify DESIGN=100 or DESIGN=1000 to process the data set in blocks of 100 or 1000 observations. If you specify the DESIGN o-option rather than DESIGN=n, PROC TRANSREG tries to process all observations at once, which might not work with very large data sets. Specify the NOZEROCONSTANT a-option with DESIGN=n to ensure that constant variables within blocks are not zeroed. See the sections Using the DESIGN Output Option and Discrete Choice Experiments: DESIGN, NORESTORE, NOZERO for more information about the DESIGN option.

DEPENDENT=name
DEP=name

specifies the untransformed dependent variable for OUT= data sets with METHOD=MORALS when there is more than one dependent variable. The default is DEPENDENT=_DEPEND_.

DREPLACE
DRE

replaces the original dependent variables with the transformed dependent variables in the OUT= data set. The names of the transformed variables in the OUT= data set correspond to the names of the original dependent variables in the input data set. By default, both the original dependent variables and the transformed dependent variables (with names constructed from the TDPREFIX= (default T) o-option and the original dependent variable names) are included in the OUT= data set.

IAPPROXIMATIONS
IAP

outputs the approximations of the transformed independent variables to the OUT= data set. These are the target values for the optimal transformations. The names of the approximation variables are constructed from the AIPREFIX= o-option (default A) and the original independent variable names. When you specify the AIPREFIX= o-option, the IAPPROXIMATIONS o-option is automatically specified for you. The IAPPROXIMATIONS o-option is not valid when METHOD=UNIVARIATE.

IREPLACE
IRE

replaces the original independent variables with the transformed independent variables in the OUT= data set. The names of the transformed variables in the OUT= data set correspond to the names of the original independent variables in the input data set. By default, both the original independent variables and the transformed independent variables (with names constructed from the TIPREFIX= o-option (default T) and the original independent variable names) are included in the OUT= data set.

LEVERAGE<=name>
LEV<=name>

creates a variable with the specified name in the OUT= data set that contains leverages. Specifying the LEVERAGE o-option is equivalent to specifying LEVERAGE=Leverage.

LILPREFIX=name
LIL=name

specifies a prefix for naming the liberal-individual-lower confidence limits. The default prefix is LIL. When you specify the LILPREFIX= o-option, the CLI o-option is automatically specified for you.

LIUPREFIX=name
LIU=name

specifies a prefix for naming the liberal-individual-upper confidence limits. The default prefix is LIU. When you specify the LIUPREFIX= o-option, the CLI o-option is automatically specified for you.

LMLPREFIX=name
LML=name

specifies a prefix for naming the liberal-mean-lower confidence limits. The default prefix is LML. When you specify the LMLPREFIX= o-option, the CLM o-option is automatically specified for you.

LMUPREFIX=name
LMU=name

specifies a prefix for naming the liberal-mean-upper confidence limits. The default prefix is LMU. When you specify the LMUPREFIX= o-option, the CLM o-option is automatically specified for you.

MACRO(keyword=name…)
MAC(keyword=name…)

creates macro variables. Most of the options available within the MACRO o-option are rarely needed. By default, PROC TRANSREG creates a macro variable named _TrgInd with a complete list of independent variables created by the procedure. When PROC TRANSREG is being used for design matrix creation prior to running a procedure without a CLASS statement, this macro provides a convenient way to use the results from PROC TRANSREG. For example, a PROC LOGISTIC step that uses a design matrix coded by PROC TRANSREG can use the following MODEL statement:

model y=&_trgind;

PROC TRANSREG, also by default, creates a macro variable named _TrgIndN, which contains the number of variables in the _TrgInd list. These macro variables can be used in an ARRAY statement as follows:

array indvars[&_trgindn] &_trgind;

See the sections Using the DESIGN Output Option and Discrete Choice Experiments: DESIGN, NORESTORE, NOZERO for examples of using the default macro variables.

The available keywords are as follows.

DN=name

specifies the name of a macro variable that contains the number of dependent variables. By default, a macro variable named _TrgDepN is created. This is the number of variables in the DL= list and the number of macro variables created by the DV= and DE= specifications.

IN=name

specifies the name of a macro variable that contains the number of independent variables. By default, a macro variable named _TrgIndN is created. This is the number of variables in the IL= list and the number of macro variables created by the IV= and IE= specifications.

DL=name

specifies the name of a macro variable that contains the list of the dependent variables. By default, a macro variable named _TrgDep is created. These are the variable names of the final transformed variables in the OUT= data set. For example, if there are three dependent variables, y1–y3, then _TrgDep contains, by default, Ty1 Ty2 Ty3 (or y1 y2 y3 if you specify the REPLACE o-option).

IL=name

specifies the name of a macro variable that contains the list of the independent variables. By default, a macro variable named _TrgInd is created. These are the variable names of the final transformed variables in the OUT= data set. For example, if there are three independent variables, x1–x3, then _TrgInd contains, by default, Tx1 Tx2 Tx3 (or x1 x2 x3 if you specify the REPLACE o-option).

DV=prefix

specifies a prefix for creating a list of macro variables, each of which contains one dependent variable name. For example, if there are three dependent variables, y1–y3, and you specify macro(dv=Dep), then three macro variables, Dep1, Dep2, and Dep3, are created, containing Ty1, Ty2, and Ty3, respectively (or y1, y2, and y3 if you specify the REPLACE o-option). By default, no list is created.

IV=prefix

specifies a prefix for creating a list of macro variables, each of which contains one independent variable name. For example, if there are three independent variables, x1–x3, and you specify macro(iv=Ind), then three macro variables, Ind1, Ind2, and Ind3, are created, containing Tx1, Tx2, and TX3, respectively (or x1, x2, and x3 if you specify the REPLACE o-option). By default, no list is created.

DE=prefix

specifies a prefix for creating a list of macro variables, each of which contains one dependent variable effect. This list shows the origin of each model term. Each effect consists of two or more parts, and each part consists of a value in 32 columns followed by a blank. For example, if you specify macro(de=d), then a macro variable d1 is created for identity(y). The d1 macro variable is shown next, wrapped onto two lines:

   4                                TY
   IDENTITY                         Y

The first part is the number of parts (4), the second part is the transformed variable name, the third part is the transformation, and the last part is the input variable name. By default, no list is created.

IE=prefix

specifies a prefix for creating a list of macro variables, each of which contains one independent variable effect. This list shows the origin of each model term. Each effect consists of two or more parts, and each part consists of a value in 32 columns followed by a blank. For example, if you specify macro(ie=I), then three macro variables, I1, I2, and I3, are created for class(x1 | x2) when both x1 and x2 have values of 1 and 2. These macro variables are shown next, with extra white space removed:

   5     Tx11     CLASS    x1   1
   5     Tx21     CLASS    x2   1
   8     Tx11x21  CLASS    x1   1      CLASS    x2   1

For CLASS variables, the formatted level appears after the variable name. The first two effects are the main effects, and the last is the interaction term. By default, no list is created.

MEANS
MEA

outputs marginal means for CLASS variable expansions to the OUT= data set.

MEC

outputs multiple regression elliptical point model coordinates to the OUT= data set.

MPC

outputs multiple regression point model coordinates to the OUT= data set.

MQC

outputs multiple regression quadratic point model coordinates to the OUT= data set.

MRC

outputs multiple regression coefficients to the OUT= data set.

MREDUNDANCY
MRE

outputs multiple redundancy analysis coefficients to the OUT= data set.

NORESTOREMISSING
NORESTORE
NOR

specifies that missing values should not be restored when the OUT= data set is created. By default, the coded CLASS variable contains a row of missing values for observations in which the CLASS variable is missing. When you specify the NORESTOREMISSING o-option, these observations contain a row of zeros instead. This is useful when PROC TRANSREG is used to code experimental designs for discrete choice models and there is a constant alternative indicated by a missing value.

NOSCORES
NOS

excludes original variables, transformed variables, predicted values, residuals, and scores from the OUT= data set. You can use the NOSCORES o-option with various other options to create an OUT= data set that contains only a coefficient partition (for example, a data set consisting entirely of coefficients and coordinates).

PREDICTED
PRE
P

outputs predicted values, which for METHOD=UNIVARIATE and METHOD=MORALS are the ordinary predicted values from the linear model, to the OUT= data set. The names of the predicted values’ variables are constructed from the PPREFIX= o-option (default P) and the original dependent variable names. When you specify the PPREFIX= o-option, the PREDICTED o-option is automatically specified for you.

PPREFIX=name
PDPREFIX=name
PDP=name

specifies a prefix for naming the dependent variable predicted values. The default is PPREFIX=P when you specify the PREDICTED o-option; otherwise, it is PPREFIX=A. When you specify the PPREFIX= o-option, the PREDICTED o-option is automatically specified for you. The PPREFIX= o-option is the same as the ADPREFIX= o-option.

RDPREFIX=name
RDP=name

specifies a prefix for naming the residual (dependent) variables to the OUT= data set. The default is RDPREFIX=R. When you specify the RDPREFIX= o-option, the RESIDUALS o-option is automatically specified for you.

REDUNDANCY<=STANDARDIZE | UNSTANDARDIZE>
RED<=STA | UNS>

outputs redundancy variables to the OUT= data set, either standardized or unstandardized. Specifying the REDUNDANCY o-option is the same as specifying REDUNDANCY=STANDARDIZE. The results of the REDUNDANCY o-option depends on the TSTANDARD= option. You must specify TSTANDARD=Z to get results based on standardized data. The TSTANDARD= option controls how the data that go into the redundancy analysis are scaled, and REDUNDANCY=STANDARDIZE|UNSTANDARDIZE controls how the redundancy variables are scaled. The REDUNDANCY o-option is automatically specified for you when you specify the METHOD=REDUNDANCY a-option. The RPREFIX= o-option specifies a prefix (default Red) for naming the redundancy variables.

REFERENCE=NONE | MISSING | ZERO
REF=NON | MIS | ZER

specifies how reference levels of CLASS variables are to be treated. The options are REFERENCE=NONE, the default, in which reference levels are suppressed; REFERENCE=MISSING, in which reference levels are displayed and output with missing values; and REFERENCE=ZERO, in which reference levels are displayed and output with zeros. You can specify the REFERENCE= option in the PROC TRANSREG, MODEL, or OUTPUT statement, and you can specify it independently for the OUT= data set and the displayed output. When you specify it in only one statement, it sets the option for both the displayed output and the OUT= data set.

REPLACE
REP

is equivalent to specifying both the DREPLACE and the IREPLACE o-options.

RESIDUALS
RES
R

outputs the differences between the transformed dependent variables and their predicted values. The names of the residual variables are constructed from the RDPREFIX= o-option (default R) and the original dependent variable names.

RPREFIX=name
RPR=name

provides a prefix for naming the redundancy variables. The default is RPREFIX=Red. When you specify the RPREFIX= o-option, the REDUNDANCY o-option is automatically specified for you.

TDPREFIX=name
TDP=name

specifies a prefix for naming the transformed dependent variables. By default, TDPREFIX=T. The TDPREFIX= o-option is ignored when you specify the DREPLACE o-option.

TIPREFIX=name
TIP=name

specifies a prefix for naming the transformed independent variables. By default, TIPREFIX=T. The TIPREFIX= o-option is ignored when you specify the IREPLACE o-option.