The TRANSREG Procedure |
OUTPUT Statement |
The OUTPUT statement creates a new SAS data set that contains coefficients, marginal means, and information about the original and transformed variables. The information about original and transformed variables composes the score partition of the data set; observations have _TYPE_=’SCORE’. The coefficients and marginal means compose the coefficient partition of the data set; observations have _TYPE_=’M COEFFI’ or _TYPE_=’MEAN’. Other values of _TYPE_ are possible; for details, see "_TYPE_ and _NAME_ Variables" later in this chapter. For details about data set structure, see the section Output Data Set. To specify the name of the output data set, use the OUT= option.
specifies the output data set for the data, transformed data, predicted values, residuals, scores, coefficients, and so on. When you use an OUTPUT statement but do not use the OUT= specification, PROC TRANSREG creates a data set and uses the DATA convention. If you want to create a permanent SAS data set, you must specify a two-level name (see "SAS Files" in SAS Language Reference: Concepts and "Introduction to DATA Step Processing" in the Base SAS Procedures Guide for details).
To control the contents of the data set and variable names, use one or more of the o-options. You can also specify these options in the PROC TRANSREG statement.
The options listed in Table 90.5 are available in the OUTPUT statement. These options include the OUT= option and all of the o-options. Many of the statistics created in the OUTPUT statement are exactly the same as statistics created by PROC REG. More details are given in the sections Predicted and Residual Values, Model Fit and Diagnostic Statistics in Chapter 73, The REG Procedure, and Chapter 4, Introduction to Regression Procedures.
Option |
Description |
---|---|
Identify output data set |
|
outputs data set |
|
Predicted Values, Residuals, Scores |
|
outputs canonical scores |
|
outputs individual confidence limits |
|
outputs mean confidence limits |
|
specifies design matrix coding |
|
replaces dependent variables |
|
replaces independent variables |
|
outputs leverage |
|
does not restore missing values |
|
suppresses output of scores |
|
outputs predicted values |
|
outputs redundancy variables |
|
replaces all variables |
|
outputs residuals |
|
Output Data Set Coefficients |
|
outputs coefficients |
|
outputs ideal point coordinates |
|
outputs marginal means |
|
outputs redundancy analysis coefficients |
|
Output Data Set Variable Name Prefixes |
|
specifies dependent variable approximations |
|
specifies independent variable approximations |
|
specifies canonical dependent variables |
|
specifies conservative individual lower CL |
|
specifies canonical independent variables |
|
specifies conservative-individual-upper CL |
|
specifies conservative-mean-lower CL |
|
specifies conservative-mean-upper CL |
|
specifies METHOD=MORALS untransformed dependent |
|
specifies liberal-individual-lower CL |
|
specifies liberal-individual-upper CL |
|
specifies liberal-mean-lower CL |
|
specifies liberal-mean-upper CL |
|
specifies residuals |
|
specifies predicted values |
|
specifies redundancy variables |
|
specifies transformed dependents |
|
specifies transformed independents |
|
Macros Variables |
|
creates macro variables |
|
Other Options |
|
outputs dependent and independent approximations |
|
outputs canonical correlation coefficients |
|
outputs canonical elliptical point coordinates |
|
outputs canonical point coordinates |
|
outputs canonical quadratic point coordinates |
|
outputs approximations to transformed dependents |
|
outputs approximations to transformed independents |
|
outputs elliptical point coordinates |
|
outputs point coordinates |
|
outputs quadratic point coordinates |
|
outputs multiple regression coefficients |
For the coefficients partition, the COEFFICIENTS, COORDINATES, and MEANS o-options provide the coefficients that are appropriate for your model. For more explicit control of the coefficient partition, use the options that control details and prefixes. The following list provides details about these options.
specifies a prefix for naming the dependent variable predicted values. The default is ADPREFIX=P when you specify the PREDICTED o-option; otherwise, it is ADPREFIX=A. When you specify the ADPREFIX= o-option, the PREDICTED o-option is automatically specified for you. The ADPREFIX= o-option is the same as the PPREFIX= o-option.
specifies a prefix for naming the independent variable approximations. The default is AIPREFIX=A. When you specify the AIPREFIX= o-option, the IAPPROXIMATIONS o-option is automatically specified for you.
is equivalent to specifying both the DAPPROXIMATIONS and the IAPPROXIMATIONS o-options. If you specify METHOD=UNIVARIATE, then the APPROXIMATIONS o-option specifies only the DAPPROXIMATIONS o-option.
outputs canonical variables to the OUT= data set. When you specify METHOD=CANALS, the CANONICAL o-option is automatically specified for you. The CDPREFIX= o-option specifies a prefix for naming the dependent canonical variables (default Cand), and the CIPREFIX= o-option specifies a prefix for naming the independent canonical variables (default Cani).
outputs canonical correlation coefficients to the OUT= data set.
provides a prefix for naming the canonical dependent variables. The default is CDPREFIX=Cand. When you specify the CDPREFIX= o-option, the CANONICAL o-option is automatically specified for you.
outputs canonical elliptical point model coordinates to the OUT= data set.
specifies a prefix for naming the conservative-individual-lower confidence limits. The default prefix is CIL. When you specify the CILPREFIX= o-option, the CLI o-option is automatically specified for you.
provides a prefix for naming the canonical independent variables. The default is CIPREFIX=Cani. When you specify the CIPREFIX= o-option, the CANONICAL o-option is automatically specified for you.
specifies a prefix for naming the conservative-individual-upper confidence limits. The default prefix is CIU. When you specify the CIUPREFIX= o-option, the CLI o-option is automatically specified for you.
outputs individual confidence limits to the OUT= data set. The names of the confidence limits variables are constructed from the original dependent variable names and the prefixes specified in the following o-options: LILPREFIX= (default LIL for liberal individual lower), CILPREFIX= (default CIL for conservative individual lower), LIUPREFIX= (default LIU for liberal individual upper), and CIUPREFIX= (default CIU for conservative individual upper). When there are no monotonicity constraints, the liberal and conservative limits are the same.
outputs mean confidence limits to the OUT= data set. The names of the confidence limits variables are constructed from the original dependent variable names and the prefixes specified in the following o-options: LMLPREFIX= (default LML for liberal mean lower), CMLPREFIX= (default CML for conservative mean lower), LMUPREFIX= (default LMU for liberal mean upper), and CMUPREFIX= (default CMU for conservative mean upper). When there are no monotonicity constraints, the liberal and conservative limits are the same.
specifies a prefix for naming the conservative-mean-lower confidence limits. The default prefix is CML. When you specify the CMLPREFIX= o-option, the CLM o-option is automatically specified for you.
specifies a prefix for naming the conservative-mean-upper confidence limits. The default prefix is CMU. When you specify the CMUPREFIX= o-option, the CLM o-option is automatically specified for you.
outputs either multiple regression coefficients or raw canonical coefficients to the OUT= data set. If you specify METHOD=CANALS (in the MODEL or PROC TRANSREG statement), then the COEFFICIENTS o-option outputs the first canonical variables, where is the value of the NCAN= a-option (specified in the MODEL or PROC TRANSREG statement). Otherwise, the COEFFICIENTS o-option includes multiple regression coefficients in the OUT= data set. In addition, when you specify the CLASS expansion for any independent variable, the COEFFICIENTS o-option also outputs marginal means.
outputs either ideal point or vector model coordinates for preference mapping to the OUT= data set. When METHOD=CANALS, these coordinates are computed from canonical coefficients; otherwise, the coordinates are computed from multiple regression coefficients. For details, see the section Point Models.
When ODS Graphics is enabled and vector model coordinates are requested, a plot is produced with points for each row and vectors for each column. If the vectors are plotted based on the actual computed coordinates, then often the vectors are short. A better graphical display is produced when the vectors are stretched. The absolute lengths of each vector can optionally be changed by specifying COORDINATES=. Then the vector coordinates are all multiplied by . Usually, is a value such as 2, 2.5, or 3. The default is 2.5. Specify COORDINATES=1 if you want to see the vectors without any stretching. The relative lengths of the different vectors are important and interpretable, and these are preserved by the stretching.
outputs canonical point model coordinates to the OUT= data set.
outputs canonical quadratic point model coordinates to the OUT= data set.
outputs the approximations of the transformed dependent variables to the OUT= data set. These are the target values for the optimal transformations. With METHOD=UNIVARIATE and METHOD=MORALS, the dependent variable approximations are the ordinary predicted values from the linear model. The names of the approximation variables are constructed from the ADPREFIX= o-option (default A) and the original dependent variable names. For ordinary predicted values, use the PREDICTED o-option instead of the DAPPROXIMATIONS o-option, since the PREDICTED o-option uses a more relevant prefix ("P" instead of "A") and a more relevant variable label suffix ("Predicted Values" instead of "Approximations").
specifies that your primary goal is design matrix coding, not analysis. Specifying the DESIGN o-option makes the procedure run faster. The DESIGN o-option sets the default method to UNIVARIATE and the default MAXITER= value to zero. It suppresses computing the regression coefficients, unless they are needed for some other option. Furthermore, when the DESIGN o-option is specified, the MODEL statement is not required to have an equal sign. When no MODEL statement equal sign is specified, all variables are considered independent variables, all options that require dependent variables are ignored, and the IREPLACE o-option is automatically specified for you.
You can use DESIGN= for coding very large data sets, where is the number of observations to code at one time. For example, to code a data set with a large number of observations, you can specify DESIGN=100 or DESIGN=1000 to process the data set in blocks of 100 or 1000 observations. If you specify the DESIGN o-option rather than DESIGN=, PROC TRANSREG tries to process all observations at once, which might not work with very large data sets. Specify the NOZEROCONSTANT a-option with DESIGN= to ensure that constant variables within blocks are not zeroed. See the sections Using the DESIGN Output Option and Discrete Choice Experiments: DESIGN, NORESTORE, NOZERO for more information about the DESIGN option.
specifies the untransformed dependent variable for OUT= data sets with METHOD=MORALS when there is more than one dependent variable. The default is DEPENDENT=_DEPEND_.
replaces the original dependent variables with the transformed dependent variables in the OUT= data set. The names of the transformed variables in the OUT= data set correspond to the names of the original dependent variables in the input data set. By default, both the original dependent variables and the transformed dependent variables (with names constructed from the TDPREFIX= (default T) o-option and the original dependent variable names) are included in the OUT= data set.
outputs the approximations of the transformed independent variables to the OUT= data set. These are the target values for the optimal transformations. The names of the approximation variables are constructed from the AIPREFIX= o-option (default A) and the original independent variable names. When you specify the AIPREFIX= o-option, the IAPPROXIMATIONS o-option is automatically specified for you. The IAPPROXIMATIONS o-option is not valid when METHOD=UNIVARIATE.
replaces the original independent variables with the transformed independent variables in the OUT= data set. The names of the transformed variables in the OUT= data set correspond to the names of the original independent variables in the input data set. By default, both the original independent variables and the transformed independent variables (with names constructed from the TIPREFIX= o-option (default T) and the original independent variable names) are included in the OUT= data set.
creates a variable with the specified name in the OUT= data set that contains leverages. Specifying the LEVERAGE o-option is equivalent to specifying LEVERAGE=Leverage.
specifies a prefix for naming the liberal-individual-lower confidence limits. The default prefix is LIL. When you specify the LILPREFIX= o-option, the CLI o-option is automatically specified for you.
specifies a prefix for naming the liberal-individual-upper confidence limits. The default prefix is LIU. When you specify the LIUPREFIX= o-option, the CLI o-option is automatically specified for you.
specifies a prefix for naming the liberal-mean-lower confidence limits. The default prefix is LML. When you specify the LMLPREFIX= o-option, the CLM o-option is automatically specified for you.
specifies a prefix for naming the liberal-mean-upper confidence limits. The default prefix is LMU. When you specify the LMUPREFIX= o-option, the CLM o-option is automatically specified for you.
creates macro variables. Most of the options available within the MACRO o-option are rarely needed. By default, PROC TRANSREG creates a macro variable named _TrgInd with a complete list of independent variables created by the procedure. When PROC TRANSREG is being used for design matrix creation prior to running a procedure without a CLASS statement, this macro provides a convenient way to use the results from PROC TRANSREG. For example, a PROC LOGISTIC step that uses a design matrix coded by PROC TRANSREG can use the following MODEL statement:
model y=&_trgind;
PROC TRANSREG, also by default, creates a macro variable named _TrgIndN, which contains the number of variables in the _TrgInd list. These macro variables can be used in an ARRAY statement as follows:
array indvars[&_trgindn] &_trgind;
See the sections Using the DESIGN Output Option and Discrete Choice Experiments: DESIGN, NORESTORE, NOZERO for examples of using the default macro variables.
The available keywords are as follows.
specifies the name of a macro variable that contains the number of dependent variables. By default, a macro variable named _TrgDepN is created. This is the number of variables in the DL= list and the number of macro variables created by the DV= and DE= specifications.
specifies the name of a macro variable that contains the number of independent variables. By default, a macro variable named _TrgIndN is created. This is the number of variables in the IL= list and the number of macro variables created by the IV= and IE= specifications.
specifies the name of a macro variable that contains the list of the dependent variables. By default, a macro variable named _TrgDep is created. These are the variable names of the final transformed variables in the OUT= data set. For example, if there are three dependent variables, y1–y3, then _TrgDep contains, by default, Ty1 Ty2 Ty3 (or y1 y2 y3 if you specify the REPLACE o-option).
specifies the name of a macro variable that contains the list of the independent variables. By default, a macro variable named _TrgInd is created. These are the variable names of the final transformed variables in the OUT= data set. For example, if there are three independent variables, x1–x3, then _TrgInd contains, by default, Tx1 Tx2 Tx3 (or x1 x2 x3 if you specify the REPLACE o-option).
specifies a prefix for creating a list of macro variables, each of which contains one dependent variable name. For example, if there are three dependent variables, y1–y3, and you specify macro(dv=Dep), then three macro variables, Dep1, Dep2, and Dep3, are created, containing Ty1, Ty2, and Ty3, respectively (or y1, y2, and y3 if you specify the REPLACE o-option). By default, no list is created.
specifies a prefix for creating a list of macro variables, each of which contains one independent variable name. For example, if there are three independent variables, x1–x3, and you specify macro(iv=Ind), then three macro variables, Ind1, Ind2, and Ind3, are created, containing Tx1, Tx2, and TX3, respectively (or x1, x2, and x3 if you specify the REPLACE o-option). By default, no list is created.
specifies a prefix for creating a list of macro variables, each of which contains one dependent variable effect. This list shows the origin of each model term. Each effect consists of two or more parts, and each part consists of a value in 32 columns followed by a blank. For example, if you specify macro(de=d), then a macro variable d1 is created for identity(y). The d1 macro variable is shown next, wrapped onto two lines:
4 TY IDENTITY Y
The first part is the number of parts (4), the second part is the transformed variable name, the third part is the transformation, and the last part is the input variable name. By default, no list is created.
specifies a prefix for creating a list of macro variables, each of which contains one independent variable effect. This list shows the origin of each model term. Each effect consists of two or more parts, and each part consists of a value in 32 columns followed by a blank. For example, if you specify macro(ie=I), then three macro variables, I1, I2, and I3, are created for class(x1 | x2) when both x1 and x2 have values of 1 and 2. These macro variables are shown next, with extra white space removed:
5 Tx11 CLASS x1 1 5 Tx21 CLASS x2 1 8 Tx11x21 CLASS x1 1 CLASS x2 1
For CLASS variables, the formatted level appears after the variable name. The first two effects are the main effects, and the last is the interaction term. By default, no list is created.
outputs marginal means for CLASS variable expansions to the OUT= data set.
outputs multiple regression elliptical point model coordinates to the OUT= data set.
outputs multiple regression point model coordinates to the OUT= data set.
outputs multiple regression quadratic point model coordinates to the OUT= data set.
outputs multiple regression coefficients to the OUT= data set.
outputs multiple redundancy analysis coefficients to the OUT= data set.
specifies that missing values should not be restored when the OUT= data set is created. By default, the coded CLASS variable contains a row of missing values for observations in which the CLASS variable is missing. When you specify the NORESTOREMISSING o-option, these observations contain a row of zeros instead. This is useful when PROC TRANSREG is used to code experimental designs for discrete choice models and there is a constant alternative indicated by a missing value.
excludes original variables, transformed variables, predicted values, residuals, and scores from the OUT= data set. You can use the NOSCORES o-option with various other options to create an OUT= data set that contains only a coefficient partition (for example, a data set consisting entirely of coefficients and coordinates).
outputs predicted values, which for METHOD=UNIVARIATE and METHOD=MORALS are the ordinary predicted values from the linear model, to the OUT= data set. The names of the predicted values’ variables are constructed from the PPREFIX= o-option (default P) and the original dependent variable names. When you specify the PPREFIX= o-option, the PREDICTED o-option is automatically specified for you.
specifies a prefix for naming the dependent variable predicted values. The default is PPREFIX=P when you specify the PREDICTED o-option; otherwise, it is PPREFIX=A. When you specify the PPREFIX= o-option, the PREDICTED o-option is automatically specified for you. The PPREFIX= o-option is the same as the ADPREFIX= o-option.
specifies a prefix for naming the residual (dependent) variables to the OUT= data set. The default is RDPREFIX=R. When you specify the RDPREFIX= o-option, the RESIDUALS o-option is automatically specified for you.
outputs redundancy variables to the OUT= data set, either standardized or unstandardized. Specifying the REDUNDANCY o-option is the same as specifying REDUNDANCY=STANDARDIZE. The results of the REDUNDANCY o-option depends on the TSTANDARD= option. You must specify TSTANDARD=Z to get results based on standardized data. The TSTANDARD= option controls how the data that go into the redundancy analysis are scaled, and REDUNDANCY=STANDARDIZE|UNSTANDARDIZE controls how the redundancy variables are scaled. The REDUNDANCY o-option is automatically specified for you when you specify the METHOD=REDUNDANCY a-option. The RPREFIX= o-option specifies a prefix (default Red) for naming the redundancy variables.
specifies how reference levels of CLASS variables are to be treated. The options are REFERENCE=NONE, the default, in which reference levels are suppressed; REFERENCE=MISSING, in which reference levels are displayed and output with missing values; and REFERENCE=ZERO, in which reference levels are displayed and output with zeros. You can specify the REFERENCE= option in the PROC TRANSREG, MODEL, or OUTPUT statement, and you can specify it independently for the OUT= data set and the displayed output. When you specify it in only one statement, it sets the option for both the displayed output and the OUT= data set.
is equivalent to specifying both the DREPLACE and the IREPLACE o-options.
outputs the differences between the transformed dependent variables and their predicted values. The names of the residual variables are constructed from the RDPREFIX= o-option (default R) and the original dependent variable names.
provides a prefix for naming the redundancy variables. The default is RPREFIX=Red. When you specify the RPREFIX= o-option, the REDUNDANCY o-option is automatically specified for you.
specifies a prefix for naming the transformed dependent variables. By default, TDPREFIX=T. The TDPREFIX= o-option is ignored when you specify the DREPLACE o-option.
specifies a prefix for naming the transformed independent variables. By default, TIPREFIX=T. The TIPREFIX= o-option is ignored when you specify the IREPLACE o-option.
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.