The TRANSREG Procedure

OUTPUT Statement

OUTPUT OUT=SAS-data-set <o-options> ;

The OUTPUT statement creates a new SAS data set that contains coefficients, marginal means, and information about the original and transformed variables. The information about original and transformed variables composes the score partition of the data set; observations have _TYPE_=’SCORE’. The coefficients and marginal means compose the coefficient partition of the data set; observations have _TYPE_=’M COEFFI’ or _TYPE_=’MEAN’. Other values of _TYPE_ are possible; for details, see "_TYPE_ and _NAME_ Variables" later in this chapter. For details about data set structure, see the section Output Data Set. To specify the name of the output data set, use the OUT= option.

OUT=SAS-data-set: specifies the output data set for the data, transformed data, predicted values, residuals, scores, coefficients, and so on. When you use an OUTPUT statement but do not use the OUT= specification, PROC TRANSREG creates a data set and uses the DATAn convention. If you want to create a permanent SAS data set, you must specify a two-level name (see "SAS Files" in SAS Language Reference: Concepts and "Introduction to DATA Step Processing" in the Base SAS Procedures Guide for details).

To control the contents of the data set and variable names, use one or more of the o-options. You can also specify these options in the PROC TRANSREG statement.

Output Options (o-options)

The options listed in Table 93.5 are available in the OUTPUT statement. These options include the OUT= option and all of the o-options. Many of the statistics created in the OUTPUT statement are exactly the same as statistics created by PROC REG. More details are given in the sections Predicted and Residual Values, Model Fit and Diagnostic Statistics in Chapter 76, The REG Procedure, and Chapter 4, Introduction to Regression Procedures.

Table 93.5 Options Available in the OUTPUT Statement
Option	Description
Identify output data set
OUT=	Outputs data set
Predicted Values, Residuals, Scores
CANONICAL	Outputs canonical scores
CLI	Outputs individual confidence limits
CLM	Outputs mean confidence limits
DESIGN=	Specifies design matrix coding
DREPLACE	Replaces dependent variables
IREPLACE	Replaces independent variables
LEVERAGE	Outputs leverage
NORESTOREMISSING	Does not restore missing values
NOSCORES	Suppresses output of scores
PREDICTED	Outputs predicted values
REDUNDANCY=	Outputs redundancy variables
REPLACE	Replaces all variables
RESIDUALS	Outputs residuals
Output Data Set Coefficients
COEFFICIENTS	Outputs coefficients
COORDINATES=	Outputs ideal point coordinates
MEANS	Outputs marginal means
MREDUNDANCY	Outputs redundancy analysis coefficients
Output Data Set Variable Name Prefixes
ADPREFIX=	Specifies dependent variable approximations
AIPREFIX=	Specifies independent variable approximations
CDPREFIX=	Specifies canonical dependent variables
CILPREFIX=	Specifies conservative individual lower CL
CIPREFIX=	Specifies canonical independent variables
CIUPREFIX=	Specifies conservative-individual-upper CL
CMLPREFIX=	Specifies conservative-mean-lower CL
CMUPREFIX=	Specifies conservative-mean-upper CL
DEPENDENT=	Specifies METHOD=MORALS untransformed dependent
LILPREFIX=	Specifies liberal-individual-lower CL
LIUPREFIX=	Specifies liberal-individual-upper CL
LMLPREFIX=	Specifies liberal-mean-lower CL
LMUPREFIX=	Specifies liberal-mean-upper CL
RDPREFIX=	Specifies residuals
PPREFIX=	Specifies predicted values
RPREFIX=	Specifies redundancy variables
TDPREFIX=	Specifies transformed dependents
TIPREFIX=	Specifies transformed independents
Macros Variables
MACRO	Creates macro variables
Other Options
APPROXIMATIONS	Outputs dependent and independent approximations
CCC	Outputs canonical correlation coefficients
CEC	Outputs canonical elliptical point coordinates
CPC	Outputs canonical point coordinates
CQC	Outputs canonical quadratic point coordinates
DAPPROXIMATIONS	Outputs approximations to transformed dependents
IAPPROXIMATIONS	Outputs approximations to transformed independents
MEC	Outputs elliptical point coordinates
MPC	Outputs point coordinates
MQC	Outputs quadratic point coordinates
MRC	Outputs multiple regression coefficients

For the coefficients partition, the COEFFICIENTS, COORDINATES, and MEANS o-options provide the coefficients that are appropriate for your model. For more explicit control of the coefficient partition, use the options that control details and prefixes. The following list provides details about these options.

ADPREFIX=name

ADP=name

specifies a prefix for naming the dependent variable predicted values. The default is ADPREFIX=P when you specify the PREDICTED o-option; otherwise, it is ADPREFIX=A. When you specify the ADPREFIX= o-option, the PREDICTED o-option is automatically specified for you. The ADPREFIX= o-option is the same as the PPREFIX= o-option.

AIPREFIX=name

AIP=name

specifies a prefix for naming the independent variable approximations. The default is AIPREFIX=A. When you specify the AIPREFIX= o-option, the IAPPROXIMATIONS o-option is automatically specified for you.

APPROXIMATIONS

APPROX

APP

is equivalent to specifying both the DAPPROXIMATIONS and the IAPPROXIMATIONS o-options. If you specify METHOD=UNIVARIATE, then the APPROXIMATIONS o-option specifies only the DAPPROXIMATIONS o-option.

CANONICAL

CAN

outputs canonical variables to the OUT= data set. When you specify METHOD=CANALS, the CANONICAL o-option is automatically specified for you. The CDPREFIX= o-option specifies a prefix for naming the dependent canonical variables (default Cand), and the CIPREFIX= o-option specifies a prefix for naming the independent canonical variables (default Cani).

CCC

outputs canonical correlation coefficients to the OUT= data set.

CDPREFIX=name

CDP=name

provides a prefix for naming the canonical dependent variables. The default is CDPREFIX=Cand. When you specify the CDPREFIX= o-option, the CANONICAL o-option is automatically specified for you.

CEC

outputs canonical elliptical point model coordinates to the OUT= data set.

CILPREFIX=name

CIL=name

specifies a prefix for naming the conservative-individual-lower confidence limits. The default prefix is CIL. When you specify the CILPREFIX= o-option, the CLI o-option is automatically specified for you.

CIPREFIX=name

CIP=name

provides a prefix for naming the canonical independent variables. The default is CIPREFIX=Cani. When you specify the CIPREFIX= o-option, the CANONICAL o-option is automatically specified for you.

CIUPREFIX=name

CIU=name

specifies a prefix for naming the conservative-individual-upper confidence limits. The default prefix is CIU. When you specify the CIUPREFIX= o-option, the CLI o-option is automatically specified for you.

CLI

outputs individual confidence limits to the OUT= data set. The names of the confidence limits variables are constructed from the original dependent variable names and the prefixes specified in the following o-options: LILPREFIX= (default LIL for liberal individual lower), CILPREFIX= (default CIL for conservative individual lower), LIUPREFIX= (default LIU for liberal individual upper), and CIUPREFIX= (default CIU for conservative individual upper). When there are no monotonicity constraints, the liberal and conservative limits are the same.

CLM

outputs mean confidence limits to the OUT= data set. The names of the confidence limits variables are constructed from the original dependent variable names and the prefixes specified in the following o-options: LMLPREFIX= (default LML for liberal mean lower), CMLPREFIX= (default CML for conservative mean lower), LMUPREFIX= (default LMU for liberal mean upper), and CMUPREFIX= (default CMU for conservative mean upper). When there are no monotonicity constraints, the liberal and conservative limits are the same.

CMLPREFIX=name

CML=name

specifies a prefix for naming the conservative-mean-lower confidence limits. The default prefix is CML. When you specify the CMLPREFIX= o-option, the CLM o-option is automatically specified for you.

CMUPREFIX=name

CMU=name

specifies a prefix for naming the conservative-mean-upper confidence limits. The default prefix is CMU. When you specify the CMUPREFIX= o-option, the CLM o-option is automatically specified for you.

COEFFICIENTS

COE

outputs either multiple regression coefficients or raw canonical coefficients to the OUT= data set. If you specify METHOD=CANALS (in the MODEL or PROC TRANSREG statement), then the COEFFICIENTS o-option outputs the first n canonical variables, where n is the value of the NCAN= a-option (specified in the MODEL or PROC TRANSREG statement). Otherwise, the COEFFICIENTS o-option includes multiple regression coefficients in the OUT= data set. In addition, when you specify the CLASS expansion for any independent variable, the COEFFICIENTS o-option also outputs marginal means.

COORDINATES<=n>

COO<=n>

outputs either ideal point or vector model coordinates for preference mapping to the OUT= data set. When METHOD=CANALS, these coordinates are computed from canonical coefficients; otherwise, the coordinates are computed from multiple regression coefficients. For details, see the section Point Models.

When ODS Graphics is enabled and vector model coordinates are requested, a plot is produced with points for each row and vectors for each column. If the vectors are plotted based on the actual computed coordinates, then often the vectors are short. A better graphical display is produced when the vectors are stretched. The absolute lengths of each vector can optionally be changed by specifying COORDINATES=n. Then the vector coordinates are all multiplied by n. Usually, n is a value such as 2, 2.5, or 3. The default is 2.5. Specify COORDINATES=1 if you want to see the vectors without any stretching. The relative lengths of the different vectors are important and interpretable, and these are preserved by the stretching.

CPC

outputs canonical point model coordinates to the OUT= data set.

CQC

outputs canonical quadratic point model coordinates to the OUT= data set.

DAPPROXIMATIONS

DAP

outputs the approximations of the transformed dependent variables to the OUT= data set. These are the target values for the optimal transformations. With METHOD=UNIVARIATE and METHOD=MORALS, the dependent variable approximations are the ordinary predicted values from the linear model. The names of the approximation variables are constructed from the ADPREFIX= o-option (default A) and the original dependent variable names. For ordinary predicted values, use the PREDICTED o-option instead of the DAPPROXIMATIONS o-option, since the PREDICTED o-option uses a more relevant prefix ("P" instead of "A") and a more relevant variable label suffix ("Predicted Values" instead of "Approximations").

DESIGN<=n>

DES<=n>

specifies that your primary goal is design matrix coding, not analysis. Specifying the DESIGN o-option makes the procedure run faster. The DESIGN o-option sets the default method to UNIVARIATE and the default MAXITER= value to zero. It suppresses computing the regression coefficients, unless they are needed for some other option. Furthermore, when the DESIGN o-option is specified, the MODEL statement is not required to have an equal sign. When no MODEL statement equal sign is specified, all variables are considered independent variables, all options that require dependent variables are ignored, and the IREPLACE o-option is automatically specified for you.

You can use DESIGN=n for coding very large data sets, where n is the number of observations to code at one time. For example, to code a data set with a large number of observations, you can specify DESIGN=100 or DESIGN=1000 to process the data set in blocks of 100 or 1000 observations. If you specify the DESIGN o-option rather than DESIGN=n, PROC TRANSREG tries to process all observations at once, which might not work with very large data sets. Specify the NOZEROCONSTANT a-option with DESIGN=n to ensure that constant variables within blocks are not zeroed. See the sections Using the DESIGN Output Option and Discrete Choice Experiments: DESIGN, NORESTORE, NOZERO for more information about the DESIGN option.

DEPENDENT=name

DEP=name

specifies the untransformed dependent variable for OUT= data sets with METHOD=MORALS when there is more than one dependent variable. The default is DEPENDENT=_DEPEND_.

DREPLACE

DRE

replaces the original dependent variables with the transformed dependent variables in the OUT= data set. The names of the transformed variables in the OUT= data set correspond to the names of the original dependent variables in the input data set. By default, both the original dependent variables and the transformed dependent variables (with names constructed from the TDPREFIX= (default T) o-option and the original dependent variable names) are included in the OUT= data set.

IAPPROXIMATIONS

IAP

outputs the approximations of the transformed independent variables to the OUT= data set. These are the target values for the optimal transformations. The names of the approximation variables are constructed from the AIPREFIX= o-option (default A) and the original independent variable names. When you specify the AIPREFIX= o-option, the IAPPROXIMATIONS o-option is automatically specified for you. The IAPPROXIMATIONS o-option is not valid when METHOD=UNIVARIATE.

IREPLACE

IRE

replaces the original independent variables with the transformed independent variables in the OUT= data set. The names of the transformed variables in the OUT= data set correspond to the names of the original independent variables in the input data set. By default, both the original independent variables and the transformed independent variables (with names constructed from the TIPREFIX= o-option (default T) and the original independent variable names) are included in the OUT= data set.

LEVERAGE<=name>

LEV<=name>

creates a variable with the specified name in the OUT= data set that contains leverages. Specifying the LEVERAGE o-option is equivalent to specifying LEVERAGE=Leverage.

LILPREFIX=name

LIL=name

specifies a prefix for naming the liberal-individual-lower confidence limits. The default prefix is LIL. When you specify the LILPREFIX= o-option, the CLI o-option is automatically specified for you.

LIUPREFIX=name

LIU=name

specifies a prefix for naming the liberal-individual-upper confidence limits. The default prefix is LIU. When you specify the LIUPREFIX= o-option, the CLI o-option is automatically specified for you.

LMLPREFIX=name

LML=name

specifies a prefix for naming the liberal-mean-lower confidence limits. The default prefix is LML. When you specify the LMLPREFIX= o-option, the CLM o-option is automatically specified for you.

LMUPREFIX=name

LMU=name

specifies a prefix for naming the liberal-mean-upper confidence limits. The default prefix is LMU. When you specify the LMUPREFIX= o-option, the CLM o-option is automatically specified for you.

MACRO(keyword=name...)

MAC(keyword=name...)

creates macro variables. Most of the options available within the MACRO o-option are rarely needed. By default, PROC TRANSREG creates a macro variable named _TrgInd with a complete list of independent variables created by the procedure. When PROC TRANSREG is being used for design matrix creation prior to running a procedure without a CLASS statement, this macro provides a convenient way to use the results from PROC TRANSREG. For example, a PROC LOGISTIC step that uses a design matrix coded by PROC TRANSREG can use the following MODEL statement:

model y=&_trgind;

PROC TRANSREG, also by default, creates a macro variable named _TrgIndN, which contains the number of variables in the _TrgInd list. These macro variables can be used in an ARRAY statement as follows:

array indvars[&_trgindn] &_trgind;

See the sections Using the DESIGN Output Option and Discrete Choice Experiments: DESIGN, NORESTORE, NOZERO for examples of using the default macro variables.

The available keywords are as follows.

DN=name

specifies the name of a macro variable that contains the number of dependent variables. By default, a macro variable named _TrgDepN is created. This is the number of variables in the DL= list and the number of macro variables created by the DV= and DE= specifications.

IN=name

specifies the name of a macro variable that contains the number of independent variables. By default, a macro variable named _TrgIndN is created. This is the number of variables in the IL= list and the number of macro variables created by the IV= and IE= specifications.

DL=name

specifies the name of a macro variable that contains the list of the dependent variables. By default, a macro variable named _TrgDep is created. These are the variable names of the final transformed variables in the OUT= data set. For example, if there are three dependent variables, y1–y3, then _TrgDep contains, by default, Ty1 Ty2 Ty3 (or y1 y2 y3 if you specify the REPLACE o-option).

IL=name

specifies the name of a macro variable that contains the list of the independent variables. By default, a macro variable named _TrgInd is created. These are the variable names of the final transformed variables in the OUT= data set. For example, if there are three independent variables, x1–x3, then _TrgInd contains, by default, Tx1 Tx2 Tx3 (or x1 x2 x3 if you specify the REPLACE o-option).

DV=prefix

specifies a prefix for creating a list of macro variables, each of which contains one dependent variable name. For example, if there are three dependent variables, y1–y3, and you specify macro(dv=Dep), then three macro variables, Dep1, Dep2, and Dep3, are created, containing Ty1, Ty2, and Ty3, respectively (or y1, y2, and y3 if you specify the REPLACE o-option). By default, no list is created.

IV=prefix

specifies a prefix for creating a list of macro variables, each of which contains one independent variable name. For example, if there are three independent variables, x1–x3, and you specify macro(iv=Ind), then three macro variables, Ind1, Ind2, and Ind3, are created, containing Tx1, Tx2, and TX3, respectively (or x1, x2, and x3 if you specify the REPLACE o-option). By default, no list is created.

DE=prefix

specifies a prefix for creating a list of macro variables, each of which contains one dependent variable effect. This list shows the origin of each model term. Each effect consists of two or more parts, and each part consists of a value in 32 columns followed by a blank. For example, if you specify macro(de=d), then a macro variable d1 is created for identity(y). The d1 macro variable is shown next, wrapped onto two lines:

   4                                TY
   IDENTITY                         Y

The first part is the number of parts (4), the second part is the transformed variable name, the third part is the transformation, and the last part is the input variable name. By default, no list is created.

IE=prefix

specifies a prefix for creating a list of macro variables, each of which contains one independent variable effect. This list shows the origin of each model term. Each effect consists of two or more parts, and each part consists of a value in 32 columns followed by a blank. For example, if you specify macro(ie=I), then three macro variables, I1, I2, and I3, are created for class(x1 | x2) when both x1 and x2 have values of 1 and 2. These macro variables are shown next, with extra white space removed:

   5     Tx11     CLASS    x1   1
   5     Tx21     CLASS    x2   1
   8     Tx11x21  CLASS    x1   1      CLASS    x2   1

For CLASS variables, the formatted level appears after the variable name. The first two effects are the main effects, and the last is the interaction term. By default, no list is created.

MEANS

MEA

outputs marginal means for CLASS variable expansions to the OUT= data set.

MEC

outputs multiple regression elliptical point model coordinates to the OUT= data set.

MPC

outputs multiple regression point model coordinates to the OUT= data set.

MQC

outputs multiple regression quadratic point model coordinates to the OUT= data set.

MRC

outputs multiple regression coefficients to the OUT= data set.

MREDUNDANCY

MRE

outputs multiple redundancy analysis coefficients to the OUT= data set.

NORESTOREMISSING

NORESTORE

NOR

specifies that missing values should not be restored when the OUT= data set is created. By default, the coded CLASS variable contains a row of missing values for observations in which the CLASS variable is missing. When you specify the NORESTOREMISSING o-option, these observations contain a row of zeros instead. This is useful when PROC TRANSREG is used to code experimental designs for discrete choice models and there is a constant alternative indicated by a missing value.

NOSCORES

NOS

excludes original variables, transformed variables, predicted values, residuals, and scores from the OUT= data set. You can use the NOSCORES o-option with various other options to create an OUT= data set that contains only a coefficient partition (for example, a data set consisting entirely of coefficients and coordinates).

PREDICTED

PRE

P

outputs predicted values, which for METHOD=UNIVARIATE and METHOD=MORALS are the ordinary predicted values from the linear model, to the OUT= data set. The names of the predicted values’ variables are constructed from the PPREFIX= o-option (default P) and the original dependent variable names. When you specify the PPREFIX= o-option, the PREDICTED o-option is automatically specified for you.

PPREFIX=name

PDPREFIX=name

PDP=name

specifies a prefix for naming the dependent variable predicted values. The default is PPREFIX=P when you specify the PREDICTED o-option; otherwise, it is PPREFIX=A. When you specify the PPREFIX= o-option, the PREDICTED o-option is automatically specified for you. The PPREFIX= o-option is the same as the ADPREFIX= o-option.

RDPREFIX=name

RDP=name

specifies a prefix for naming the residual (dependent) variables to the OUT= data set. The default is RDPREFIX=R. When you specify the RDPREFIX= o-option, the RESIDUALS o-option is automatically specified for you.

REDUNDANCY<=STANDARDIZE | UNSTANDARDIZE>

RED<=STA | UNS>

outputs redundancy variables to the OUT= data set, either standardized or unstandardized. Specifying the REDUNDANCY o-option is the same as specifying REDUNDANCY=STANDARDIZE. The results of the REDUNDANCY o-option depends on the TSTANDARD= option. You must specify TSTANDARD=Z to get results based on standardized data. The TSTANDARD= option controls how the data that go into the redundancy analysis are scaled, and REDUNDANCY=STANDARDIZE|UNSTANDARDIZE controls how the redundancy variables are scaled. The REDUNDANCY o-option is automatically specified for you when you specify the METHOD=REDUNDANCY a-option. The RPREFIX= o-option specifies a prefix (default Red) for naming the redundancy variables.

REFERENCE=NONE | MISSING | ZERO

REF=NON | MIS | ZER

specifies how reference levels of CLASS variables are to be treated. The options are REFERENCE=NONE, the default, in which reference levels are suppressed; REFERENCE=MISSING, in which reference levels are displayed and output with missing values; and REFERENCE=ZERO, in which reference levels are displayed and output with zeros. You can specify the REFERENCE= option in the PROC TRANSREG, MODEL, or OUTPUT statement, and you can specify it independently for the OUT= data set and the displayed output. When you specify it in only one statement, it sets the option for both the displayed output and the OUT= data set.

REPLACE

REP

is equivalent to specifying both the DREPLACE and the IREPLACE o-options.

RESIDUALS

RES

R

outputs the differences between the transformed dependent variables and their predicted values. The names of the residual variables are constructed from the RDPREFIX= o-option (default R) and the original dependent variable names.

RPREFIX=name

RPR=name

provides a prefix for naming the redundancy variables. The default is RPREFIX=Red. When you specify the RPREFIX= o-option, the REDUNDANCY o-option is automatically specified for you.

TDPREFIX=name

TDP=name

specifies a prefix for naming the transformed dependent variables. By default, TDPREFIX=T. The TDPREFIX= o-option is ignored when you specify the DREPLACE o-option.

TIPREFIX=name

TIP=name

specifies a prefix for naming the transformed independent variables. By default, TIPREFIX=T. The TIPREFIX= o-option is ignored when you specify the IREPLACE o-option.