The OUT= output data set can contain a great deal of information; however, in most cases, the output data set contains a small portion of the entire range of available information.
This section provides three brief examples, illustrating some typical OUT= output data sets. See the section Output Data Set Contents for a complete list of the contents of the OUT= data set.
The first example shows the output data set from a twoway ANOVA model. The following statements produce Figure 97.65:
title 'ANOVA Output Data Set Example'; data ReferenceCell; input y x1 $ x2 $; datalines; 11 a a 12 a a 10 a a 4 a b 5 a b 3 a b 5 b a 6 b a 4 b a 2 b b 3 b b 1 b b ;
* Fit Reference Cell TwoWay ANOVA Model; proc transreg data=ReferenceCell; model identity(y) = class(x1  x2); output coefficients replace predicted residuals; run; * Print the Results; proc print; run; proc contents position; ods select position; run;
Figure 97.65: ANOVA Example Output Data Set Contents
ANOVA Output Data Set Example 
Obs  _TYPE_  _NAME_  y  Py  Ry  Intercept  x1a  x2a  x1ax2a  x1  x2 

1  SCORE  ROW1  11  11  0  1  1.0  1  1  a  a 
2  SCORE  ROW2  12  11  1  1  1.0  1  1  a  a 
3  SCORE  ROW3  10  11  1  1  1.0  1  1  a  a 
4  SCORE  ROW4  4  4  0  1  1.0  0  0  a  b 
5  SCORE  ROW5  5  4  1  1  1.0  0  0  a  b 
6  SCORE  ROW6  3  4  1  1  1.0  0  0  a  b 
7  SCORE  ROW7  5  5  0  1  0.0  1  0  b  a 
8  SCORE  ROW8  6  5  1  1  0.0  1  0  b  a 
9  SCORE  ROW9  4  5  1  1  0.0  1  0  b  a 
10  SCORE  ROW10  2  2  0  1  0.0  0  0  b  b 
11  SCORE  ROW11  3  2  1  1  0.0  0  0  b  b 
12  SCORE  ROW12  1  2  1  1  0.0  0  0  b  b 
13  M COEFFI  y  .  .  .  2  2.0  3  4  
14  MEAN  y  .  .  .  .  7.5  8  11 
ANOVA Output Data Set Example 
Variables in Creation Order  

#  Variable  Type  Len  Label 
1  _TYPE_  Char  8  
2  _NAME_  Char  32  
3  y  Num  8  
4  Py  Num  8  y Predicted Values 
5  Ry  Num  8  y Residuals 
6  Intercept  Num  8  Intercept 
7  x1a  Num  8  x1 a 
8  x2a  Num  8  x2 a 
9  x1ax2a  Num  8  x1 a * x2 a 
10  x1  Char  32  
11  x2  Char  32 
The _TYPE_
variable indicates observation type: score, multiple regression coefficient (parameter estimates), and marginal means. The
_NAME_
variable contains the default observation labels, “ROW1”, “ROW2”, and so on, and contains the dependent variable name (y
) for the remaining observations. If you specify an ID statement, _NAME_
contains the values of the first ID variable for score observations. The y
variable is the dependent variable, Py
contains the predicted values, Ry
contains the residuals, and the variables Intercept
through x1ax2a
contain the design matrix. The x1
and x2
variables are the original CLASS variables.
The next example shows the contents of the output data set from fitting a curve through a scatter plot. The following statements produce Figure 97.66:
title 'Output Data Set for Curve Fitting Example'; data a; do x = 1 to 100; y = log(x) + sin(x / 10) + normal(7); output; end; run; proc transreg; model identity(y) = spline(x / nknots=9); output predicted out=b; run; proc contents position; ods select position; run;
Figure 97.66: Predicted Values Example Output Data Set Contents
Output Data Set for Curve Fitting Example 
Variables in Creation Order  

#  Variable  Type  Len  Label 
1  _TYPE_  Char  8  
2  _NAME_  Char  32  
3  y  Num  8  
4  Ty  Num  8  y Transformation 
5  Py  Num  8  y Predicted Values 
6  Intercept  Num  8  Intercept 
7  x  Num  8  
8  TIntercept  Num  8  Intercept Transformation 
9  Tx  Num  8  x Transformation 
The OUT= data set contains _TYPE_
and _NAME_
variables. Since no coefficients or coordinates are requested, all observations are _TYPE_
=’SCORE’. The y
variable is the original dependent variable, Ty
is the transformed dependent variable, Py
contains the predicted values, x
is the original independent variable, and Tx
is the transformed independent variable. The data set also contains an Intercept
and transformed intercept TIntercept
variable. (In this case, the transformed intercept is the same as the intercept. However, if you specify the TSTANDARD= and ADDITIVE options, these are not always the same.)
The following example shows the results from specifying METHOD=MORALS when there is more than one dependent variable:
title 'METHOD=MORALS Output Data Set Example'; data x; input y1 y2 x1 $ x2 $; datalines; 11 1 a a 10 4 b a 5 2 a b 5 9 b b 4 3 c c 3 6 b a 1 8 a b ;
* Fit Reference Cell TwoWay ANOVA Model; proc transreg data=x noprint solve; model spline(y1 y2) = opscore(x1 x2 / name=(n1 n2)); output coefficients predicted residuals; id x1 x2; run; * Print the Results; proc print; run; proc contents position; ods select position; run;
These statements produce Figure 97.67.
Figure 97.67: METHOD=MORALS Rolled Output Data Set
METHOD=MORALS Output Data Set Example 
Obs  _DEPVAR_  _TYPE_  _NAME_  _DEPEND_  T_DEPEND_  P_DEPEND_  R_DEPEND_  Intercept  n1  n2  TIntercept  Tn1  Tn2  x1  x2 

1  Spline(y1)  SCORE  a  11  13.1600  11.1554  2.00464  1  0  0  1.0000  0.06711  0.09384  a  a 
2  Spline(y1)  SCORE  b  10  6.1931  6.8835  0.69041  1  1  0  1.0000  1.51978  0.09384  b  a 
3  Spline(y1)  SCORE  a  5  2.4467  4.7140  2.26724  1  0  1  1.0000  0.06711  1.32038  a  b 
4  Spline(y1)  SCORE  b  5  2.4467  0.4421  2.00464  1  1  1  1.0000  1.51978  1.32038  b  b 
5  Spline(y1)  SCORE  c  4  4.2076  4.2076  0.00000  1  2  2  1.0000  0.23932  1.32038  c  c 
6  Spline(y1)  SCORE  b  3  5.5693  6.8835  1.31422  1  1  0  1.0000  1.51978  0.09384  b  a 
7  Spline(y1)  SCORE  a  1  4.9766  4.7140  0.26261  1  0  1  1.0000  0.06711  1.32038  a  b 
8  Spline(y1)  M COEFFI  y1  .  .  .  .  .  .  .  10.9253  2.94071  4.55475  y1  y1 
9  Spline(y2)  SCORE  a  1  0.5303  0.5199  0.01043  1  0  0  1.0000  0.03739  0.09384  a  a 
10  Spline(y2)  SCORE  b  4  5.5487  4.5689  0.97988  1  1  0  1.0000  1.51395  0.09384  b  a 
11  Spline(y2)  SCORE  a  2  3.8940  4.5575  0.66347  1  0  1  1.0000  0.03739  1.32038  a  b 
12  Spline(y2)  SCORE  b  9  9.6358  9.6462  0.01043  1  1  1  1.0000  1.51395  1.32038  b  b 
13  Spline(y2)  SCORE  c  3  5.6210  5.6210  0.00000  1  2  2  1.0000  0.34598  1.32038  c  c 
14  Spline(y2)  SCORE  b  6  3.5994  4.5689  0.96945  1  1  0  1.0000  1.51395  0.09384  b  a 
15  Spline(y2)  SCORE  a  8  5.2314  4.5575  0.67390  1  0  1  1.0000  0.03739  1.32038  a  b 
16  Spline(y2)  M COEFFI  y2  .  .  .  .  .  .  .  0.3119  3.44636  3.59024  y2  y2 
METHOD=MORALS Output Data Set Example 
Variables in Creation Order  

#  Variable  Type  Len  Label 
1  _DEPVAR_  Char  42  Dependent Variable Transformation(Name) 
2  _TYPE_  Char  8  
3  _NAME_  Char  32  
4  _DEPEND_  Num  8  Dependent Variable 
5  T_DEPEND_  Num  8  Dependent Variable Transformation 
6  P_DEPEND_  Num  8  Dependent Variable Predicted Values 
7  R_DEPEND_  Num  8  Dependent Variable Residuals 
8  Intercept  Num  8  Intercept 
9  n1  Num  8  
10  n2  Num  8  
11  TIntercept  Num  8  Intercept Transformation 
12  Tn1  Num  8  n1 Transformation 
13  Tn2  Num  8  n2 Transformation 
14  x1  Char  32  
15  x2  Char  32 
If you specify METHOD=MORALS with multiple dependent variables, PROC TRANSREG performs separate univariate analyses and stacks the results in the
OUT= data set. For this example, the results of the first analysis are in the partition designated by _DEPVAR_
=’Spline(y1
)’ and the results of the second analysis are in the partition designated by _DEPVAR_
=’Spline(y2
)’, which are the transformation and dependent variable names. Each partition has _TYPE_
=’SCORE’ observations for the variables and a _TYPE_
=’M COEFFI’ observation for the coefficients. In this example, an ID variable is specified, so the _NAME_
variable contains the formatted values of the first ID variable. Since both dependent variables have to go into the same
column, the dependent variable is given a new name, _DEPEND_
. The dependent variable transformation is named T_DEPEND_
, the predicted values variable is named P_DEPEND_
, and the residuals variable is named R_DEPEND_
.
The independent variables are character OPSCORE variables. By default, PROC TRANSREG replaces character OPSCORE variables with category numbers and discards the original
character variables. To avoid this, the input variables are renamed from x1
and x2
to n1
and n2
and the original x1
and x2
are added to the data set as ID variables. The n1
and n2
variables contain the initial values for the OPSCORE transformations, and the Tn1
and Tn2
variables contain optimal scores. The data set also contains an Intercept
and transformed intercept TIntercept
variable. The regression coefficients are in the transformation columns, which also contain the variables to which they apply.
Table 97.6 summarizes the various matrices that can result from PROC TRANSREG processing and that appear in the OUT= data set. The exact contents of an OUT= data set depends on many options.
Table 97.6: PROC TRANSREG OUT= Data Set Contents
_TYPE_ 
Contents 
Options, Default Prefix 

SCORE 
dependent variables 
DREPLACE not specified 
SCORE 
independent variables 
IREPLACE not specified 
SCORE 
transformed dependent variables 
default, TDPREFIX=T 
SCORE 
transformed independent variables 
default, TIPREFIX=T 
SCORE 
predicted values 
PREDICTED, PPREFIX=P 
SCORE 
residuals 
RESIDUALS, RDPREFIX=R 
SCORE 
leverage 
LEVERAGE, LEVERAGE=Leverage 
SCORE 
lower individual confidence limits 

SCORE 
upper individual confidence limits 

SCORE 
lower mean confidence limits 

SCORE 
upper mean confidence limits 

SCORE 
dependent canonical variables 

SCORE 
independent canonical variables 

SCORE 
redundancy variables 

SCORE 
ID, CLASS, BSPLINE variables 

SCORE 
independent variables approximations 

M COEFFI 
multiple regression coefficients 

C COEFFI 
canonical coefficients 

MEAN 
marginal means 

M REDUND 
multiple redundancy coefficients 

R REDUND 
multiple redundancy coefficients 

M POINT 
point coordinates 
COORDINATES or MPC, POINT 
M EPOINT 
elliptical point coordinates 
COORDINATES or MEC, EPOINT 
M QPOINT 
quadratic point coordinates 
COORDINATES or MQC, QPOINT 
C POINT 
canonical point coordinates 
COORDINATES or CPC, POINT 
C EPOINT 
canonical elliptical point coordinates 
COORDINATES or CEC, EPOINT 
C QPOINT 
canonical quadratic point coordinates 
COORDINATES or CQC, QPOINT 
The independent and dependent variables are created from the original input data. Several potential differences exist between these variables and the actual input data. An intercept variable can be added, new variables can be added for POINT, EPOINT, QPOINT, CLASS, IDENTITY, PSPLINE, and BSPLINE variables, and category numbers are substituted for character OPSCORE variables. These matrices are not always what is input to the first iteration. After the expanded data set is stored for inclusion in the output data set, several things happen to the data before they are input to the first iteration: column means are substituted for missing values; zerodegree SPLINE and MSPLINE variables are transformed so that the iterative algorithms get stepfunction data as input, which conform to the zerodegree transformation family restrictions; and the nonoptimal transformations are performed.
When you specify METHOD=UNIVARIATE (in the MODEL or PROC TRANSREG statement), PROC TRANSREG can perform several analyses, one for each dependent variable.
While each dependent variable can be transformed, their independent variables are not transformed. The OUT= data set optionally contains all of the _TYPE_
=’SCORE’ observations, optionally followed by coefficients or coordinates.
When you specify METHOD=MORALS (in the MODEL or PROC TRANSREG statement), successive analyses are performed, one for each dependent variable. Each analysis transforms one dependent variable and the entire set of the independent variables. All information for the first dependent variable (scores then, optionally, coefficients) appears first. Then all information for the second dependent variable (scores then, optionally, coefficients) appears next. This arrangement is repeated for all dependent variables.
For METHOD=CANALS and METHOD=REDUNDANCY (specified in either the MODEL or PROC TRANSREG statement), one analysis is performed that simultaneously
transforms all dependent and independent variables. The OUT= data set optionally contains all of the _TYPE_
=’SCORE’ observations, optionally followed by coefficients or coordinates.
As shown in the preceding examples, some variables in the output data set directly correspond to input variables, and some are created. All original optimal and nonoptimal transformation variable names are unchanged.
The names of the POINT, QPOINT, and EPOINT expansion variables are also left unchanged, but new variables are created. When independent POINT variables are present,
the sumofsquares variable _ISSQ_ is added to the output data set. For each EPOINT and QPOINT variable, a new squared variable
is created by appending “_2”. For example, Dim1
and Dim2
are expanded into Dim1
, Dim2
, Dim1_2
, and Dim2_2
. In addition, for each pair of QPOINT variables, a new crossproduct variable is created by combining the two names—for example,
Dim1Dim2
.
The names of the CLASS variables are constructed from original variable names and levels. Lengths are controlled by the CPREFIX= aoption. For example, when x1
and x2
both have values of ’a’ and ’b’, CLASS(x1
 x2
/ ZERO=NONE) creates x1
maineffect variable names x1a x1b
, x2
maineffect variable names x2a x2b
, and interaction variable names x1ax2a x1ax2b x1bx2a x1bx2b
.
PROC TRANSREG then uses these variable names when creating the transformed, predicted, and residual variable names by affixing the relevant prefix and dropping extra characters if necessary.
When you specify METHOD=MORALS and only one dependent variable is present, the output data set is structured exactly as if METHOD=REDUNDANCY (see
the section Details for the CANALS and REDUNDANCY Methods). When more than one dependent variable is present, the dependent variables are output in the variable _DEPEND_
, transformed dependent variables are output in the variable T_DEPEND_
, predicted values are output in the variable P_DEPEND_
, and residuals are output in the variable R_DEPEND_
. You can partition the data set into BY groups, one per dependent variable, by referring to the character variable _DEPVAR_
, which contains the original dependent variable names and transformations.
When the same name is generated from multiple variables in the OUT= data set, new names are created by appending ’2’, ’3’, or ’4’, and so on, until a unique name is created. For 32character
names, the last character is replaced with a numeric suffix until a unique name is created. For example, if there are two
output variables that otherwise would be named x
, then x
and x2
are created instead. If there are two output variables that otherwise would be named ThisIsAThirtyTwoCharacterVarName
, then ThisIsAThirtyTwoCharacterVarName
and ThisIsAThirtyTwoCharacterVarNam2
are created instead.