Usage Note 23217: Saving the coded design matrix of a model to a data set
For a specified model, there are several procedures that allow you to save the design matrix to a data set. PROC LOGISTIC with the OUTDESIGN= and OUTDESIGNONLY options is the most flexible and convenient for models without random effects. Use PROC GLIMMIX with the OUTDESIGN= option if the model includes random effects and you want to save the design matrix for the random effects. See also this note on the OUTDESIGN= option in PROC GLIMMIX.

GLMMOD or GLIMMIX: For models using GLM parameterization (also called indicator or dummy coding) of CLASS variables, you can use an ODS OUTPUT statement with PROC GLMMOD to save the design matrix to a data set. Alternatively, you can use the OUTDESIGN= option in PROC GLIMMIX. Note that modeling procedures such as GLM, MIXED, GLIMMIX and others offer GLM parameterization. This is a fullrank coding method that creates k 0,1coded design variables for a predictor with k levels. Specify the model in the MODEL statement and identify any categorical predictors in the CLASS statement. Note that GLMMOD and GLIMMIX only offer GLM parameterization of CLASS variables. The GLM statements below fit the indicated model and the GLMMOD and GLIMMIX statements that follow use the same design matrix as PROC GLM but also save it in a data set. With GLMMOD, the data set contains the response variable and the coded design variables. The names of the coded design variables concatenate the variable name and the variable level separated by an underscore. With GLIMMIX, the data set contains all input data set variables (unless the NOVARS option is specified) and all design variables. The names of the coded design variables are _X1 (which contains 1 for all observations and represents the intercept), _X2, _X3, and so on. Use the ID statement to include the response variable (or other variables) in the data set.
proc glm data=a;
class a b c;
model y=a b c a*b;
run;
ods output designpoints=xmatrix;
proc glmmod data=a;
class a b c;
model y=a b c a*b;
run;
proc glimmix data=a outdesign=xmatrix;
class a b c;
model y=a b c a*b;
id y;
run;

LOGISTIC or GLMSELECT: For models that use GLM or other parameterizations, you can use the OUTDESIGN= option in the LOGISTIC or GLMSELECT procedure. These procedures can create design variables using any of several different parameterizations including GLM, reference, effects, polynomial, and others. Specify the model in the MODEL statement and identify any categorical predictors in the CLASS statement. Use the PARAM= option in the CLASS statement to select the parameterization. Each of the GLMSELECT and LOGISTIC steps below creates a data set containing the same design matrix as produced above by PROC GLMMOD (and as used internally by PROC GLM). With both procedures, the saved data set contains only the response variable and the coded design variables. To include other variables, specify them as predictors in the MODEL statement. Note that the OUTDESIGNONLY option in PROC LOGISTIC prevents the analysis which makes it particularly convenient for this purpose regardless of the distribution of the response. The variable names created by PROC LOGISTIC for the coded design variables concatenate the variable name and the variable level for the most common parameterizations. In PROC GLMSELECT, the SELECTION=NONE option requests that the procedure fit the specified model rather than use a model selection method. Variable names created by PROC GLMSELECT for the coded design variables concatenate the variable name and the variable level separated by an underscore for the most common parameterizations.
proc logistic data=a outdesign=xmatrix outdesignonly;
class a b c / param=glm;
model y=a b c a*b;
run;
proc glmselect data=a outdesign=xmatrix;
class a b c / param=glm;
model y=a b c a*b / selection=none;
run;
To use other coding methods, specify them as needed in the CLASS statement. For example, these statements use effects coding for the CLASS variables:
proc logistic data=a outdesign=xmatrix outdesignonly;
class a b c / param=effect;
model y=a b c a*b;
run;
See the LOGISTIC and GLMSELECT documentation for information about the various coding methods that are available.

TRANSREG: For models using GLM, reference, or effects parameterization, you can use PROC TRANSREG. Specify any categorical variables in the CLASS expansion. Use the ZERO= option to select a reference category or, as below, ZERO=NONE to specify GLM parameterization. Specify any continuous predictors, the response, and any other variables that you want copied to the output data set in the ID statement. For example, the following statements create a data set containing the same design matrix as produced above by PROC GLMMOD (and as used internally by PROC GLM). Variable names created by PROC TRANSREG for the coded design variables concatenate the variable name and the variable level.
proc transreg data=a design;
model class(a b c a*b / zero=none);
id y;
output out=xmatrix;
run;
Effects coding can be done as follows:
proc transreg data=a design;
model class(a b c a*b / effects);
id y;
output out=xmatrix;
run;
Note that PROC TRANSREG automatically creates a macro variable, _trgind, which contains a list of variable names that it creates. You can use this macro variable in subsequent procedures to refer to the full model.
Operating System and Release Information
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
Type:  Usage Note 
Priority:  low 
Topic:  SAS Reference ==> Procedures ==> GLMMOD SAS Reference ==> Procedures ==> TRANSREG Analytics ==> Regression Analytics ==> Analysis of Variance Analytics ==> Categorical Data Analysis SAS Reference ==> Procedures ==> LOGISTIC SAS Reference ==> Procedures ==> GLM SAS Reference ==> Procedures ==> GLMSELECT SAS Reference ==> Procedures ==> GLIMMIX

Date Modified:  20150708 14:40:51 
Date Created:  20030409 14:46:05 