SUPPORT / SAMPLES & SAS NOTES
 

Support

Usage Note 23217: Saving the coded design matrix of a model in a data set

DetailsAboutRate It

For a specified model, there are several procedures that allow you to save the columns of the design matrix as variables in a SAS data set. PROC LOGISTIC with the OUTDESIGN= and OUTDESIGNONLY options is the most efficient for models without random effects using any coding scheme (parameterization), but PROC GLMSELECT additionally provides macro variables containing the names of all design variables created in the OUTDESIGN= data set. These macro variables are convenient for use in subsequent analyses. Use PROC GLIMMIX with the OUTDESIGN= option if the model includes random effects and you want to save the design matrix for the random effects. See also SAS Note 40631 about the OUTDESIGN= option in PROC GLIMMIX.

  • GLMMOD or GLIMMIX: For models using GLM parameterization (also called indicator or dummy coding) of CLASS variables, you can use an ODS OUTPUT statement with PROC GLMMOD to save the design matrix to a data set. Alternatively, you can use the OUTDESIGN= option in PROC GLIMMIX. Note that modeling procedures such as GLM, MIXED, GLIMMIX, and others offer GLM parameterization. This is a non-full-rank coding method that creates k 0,1-coded design variables for a predictor with k levels.

    Specify your model in the MODEL statement and identify any categorical predictors in the CLASS statement. Note that GLMMOD and GLIMMIX offer only GLM parameterization of CLASS variables. The GLM statements below fit the indicated model and the GLMMOD and GLIMMIX statements that follow use the same design matrix as PROC GLM but also save it in a data set. With GLMMOD, the data set contains the response variable and the coded design variables. The names of the coded design variables concatenate the variable name and the variable level separated by an underscore. With GLIMMIX, the data set contains all input data set variables (unless OUTDESIGN(NOVAR) is specified) and all design variables. The names of the coded design variables are _X1, _X2, _X3, and so on for the fixed effects and _Z1, _Z2, _Z3, and so on for the random effects. A prefix other than _X or _Z can be specified using the OUTDESIGN(X=prefix Z=prefix) option. Use the ID statement to include the response variable (or other variables) in the data set. When only the design matrix is needed, it is most efficient to add the NOFIT option in the PROC GLIMMIX statement to prevent the procedure from fitting the model.

       proc glm data=a;
          class a b c;
          model y=a b c a*b;
          run;
    
       proc glmmod data=a outdesign=xmatrix;
          class a b c;
          model y=a b c a*b;
          run;
          
       proc glimmix data=a outdesign(novar)=xmatrix nofit;
          class a b c;
          model y=a b c a*b;
          id y;
          run;      
    
    
  • LOGISTIC or GLMSELECT: For models that use GLM or other parameterizations, you can use the OUTDESIGN= option in the LOGISTIC or GLMSELECT procedure. These procedures can create design variables using any of several different parameterizations including GLM, reference, effects, polynomial, and others. Specify the model in the MODEL statement and identify any categorical predictors in the CLASS statement. Use the PARAM= option in the CLASS statement to select the parameterization.

    Each of the GLMSELECT and LOGISTIC steps below creates a data set containing the same design matrix as produced above by PROC GLMMOD (and as used internally by PROC GLM). With both procedures, the saved data set contains only the response variable and the coded design variables. To include other variables, specify them as predictors in the MODEL statement. Note that the OUTDESIGNONLY option in PROC LOGISTIC prevents the analysis, which makes it particularly efficient for this purpose regardless of the distribution of the response. The variable names created by PROC LOGISTIC for the coded design variables concatenate the variable name and the variable level for the most common parameterizations.

    In PROC GLMSELECT, the SELECTION=NONE option requests that the procedure fit the specified model rather than use a model selection method. By default, variable names created by PROC GLMSELECT for the coded design variables concatenate the variable name and the variable level separated by an underscore for the most common parameterizations. See the GLMSELECT documentation for the suboptions available in the OUTDESIGN= option. GLMSELECT automatically produces several macro variables that contain the names of the created design variables. The %PUT statement below displays the contents of those macro variables. See the description of the MAXMACRO= option in the GLMSELECT documentation for more details.

       proc logistic data=a outdesign=xmatrix outdesignonly;
          class a b c / param=glm;
          model y=a b c a*b;
          run;
       proc glmselect data=a outdesign=xmatrix;
          class a b c / param=glm;
          model y=a b c a*b / selection=none;
          run;
       %put _user_;
    
    To use other coding methods, specify them as needed in the CLASS statement. For example, these statements use effects coding for the CLASS variables:
       proc logistic data=a outdesign=xmatrix outdesignonly;
          class a b c / param=effect;
          model y=a b c a*b;
          run;
    
    See the LOGISTIC and GLMSELECT documentation for information about the various coding methods that are available.
  • TRANSREG: For models using GLM, reference, or effects parameterization, you can use PROC TRANSREG. Specify any categorical variables in the CLASS expansion. Use the ZERO= option to select a reference category or, as below, ZERO=NONE to specify GLM parameterization. Specify any continuous predictors, the response, and any other variables that you want copied to the output data set in the ID statement. For example, the following statements create a data set that contains the same design matrix as produced above by PROC GLMMOD (and as used internally by PROC GLM). Variable names created by PROC TRANSREG for the coded design variables concatenate the variable name and the variable level:
       proc transreg data=a design;
          model class(a b c a*b / zero=none);
          id y;
          output out=xmatrix;
          run;
    
    Effects coding can be done as follows:
       proc transreg data=a design;
          model class(a b c a*b / effects);
          id y;
          output out=xmatrix;
          run;
    
    Note that PROC TRANSREG automatically creates a macro variable, _trgind, which contains a list of variable names that it creates. You can use this macro variable in subsequent procedures to refer to the full model.


Operating System and Release Information

Product FamilyProductSystemSAS Release
ReportedFixed*
SAS SystemSAS/STATAlln/a
* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.