Create Design Matrix |
PROC MCMC does not support a CLASS statement; therefore you need to construct the right design matrix (with dummy or indicator variables) prior to calling the procedure. The best tool to use is the TRANSREG procedure (see Chapter 93, The TRANSREG Procedure ). This procedure offers both indicator and effects coding methods. You can specify any categorical variables in the CLASS expansion, and use the ZERO= option to select a reference category. You can also specify any other data set variables (predictors, the responses, and so on) to the output data set in the ID statement.
For example, the following statements create a data set that contains two categorical variables (City and G), and two continuous variables (x and resp):
title 'Create Design Matrix'; data categorical; input City$ G$ x resp @@; datalines; Chicago F 69.0 112.5 Chicago F 56.5 84.0 Chicago M 65.3 98.0 Chicago M 59.8 84.5 NewYork M 62.8 102.5 NewYork M 63.5 102.5 NewYork F 57.3 83.0 NewYork M 57.5 85.0 ;
Suppose you are interested in creating a design matrix that uses dummy variable coding for the categorical variables City, G and their interaction City * G. You can use the following PROC TRANSREG statements:
proc transreg data=categorical design; model class(city g city*g / zero=last); id x resp; output out=input_mcmc(drop=_: Int:); run;
The DESIGN option specifies that the primary goal is to code the design matrix. The MODEL statement indicates the variable of interest. The CLASS option in the MODEL statement expands the variables of interest to a list of “dummy” variables. The ZERO=LAST option sets the reference level. The ID statement includes x and resp in the OUT= data set. And the OUTPUT statement creates a new data set Input_MCMC that stores the design matrix and original variables from the original data set.
A quick call of the PRINT procedure shows the output from the PROC TRANSREG call:
proc print data=input_mcmc; run;
Figure 54.14 prints the design matrix that is generated by PROC TRANSREG. The Input_mcmc data set contains all the variables from the original Categorical data set, in addition to corresponding dummy variables (CityChicago, GF, and CityChicagoGF) for the categorical variables.
Create Design Matrix |
Obs | CityChicago | GF | CityChicagoGF | City | G | x | resp |
---|---|---|---|---|---|---|---|
1 | 1 | 1 | 1 | Chicago | F | 69.0 | 112.5 |
2 | 1 | 1 | 1 | Chicago | F | 56.5 | 84.0 |
3 | 1 | 0 | 0 | Chicago | M | 65.3 | 98.0 |
4 | 1 | 0 | 0 | Chicago | M | 59.8 | 84.5 |
5 | 0 | 0 | 0 | NewYork | M | 62.8 | 102.5 |
6 | 0 | 0 | 0 | NewYork | M | 63.5 | 102.5 |
7 | 0 | 1 | 0 | NewYork | F | 57.3 | 83.0 |
8 | 0 | 0 | 0 | NewYork | M | 57.5 | 85.0 |
You can now proceed to call PROC MCMC using this input data set Input_mcmc and the corresponding dummy variables.
PROC TRANSREG automatically creates a macro variable, &_TRGIND, which contains a list of variable names that it creates. The %put &_trgind; statement prints the following:
CityChicago GF CityChicagoGF
The macro variable &_TRGIND can come handy if you want to build a regression model; you can refer to &_TRGIND in the following way:
proc mcmc data=input_mcmc; array data[5] 1 &_trgind x; array beta[5] beta0-beta4; ...; call mult(beta, data, mu); ...;
The first ARRAY statement defines a one-dimensional array of length 5, and it takes on five values: a constant 1 and variables CityChicago, GF, CityChicagoGF, and x. The second ARRAY statement defines an array of beta, which are the model parameters. Later in the program, you can use the CALL MULT function to calculate the regression mean and store the value in the symbol mu.