PROC MCMC does not support a CLASS statement; therefore you need to construct the right design matrix (with dummy or indicator variables) prior to calling PROC MCMC. The best tool to use is the TRANSREG procedure (see Chapter 101: The TRANSREG Procedure). This procedure offers both indicator and effects coding methods. You can specify any categorical variables in the CLASS expansion, and use the ZERO= option to select a reference category. You can also specify any other data set variables (predictors, the responses, and so on) to the output data set in the ID statement.
For example, the following statements create a data set that contains two categorical variables (City
and G
), and two continuous variables (x
and resp
):
title 'Create Design Matrix'; data categorical; input City$ G$ x resp @@; datalines; Chicago F 69.0 112.5 Chicago F 56.5 84.0 Chicago M 65.3 98.0 Chicago M 59.8 84.5 NewYork M 62.8 102.5 NewYork M 63.5 102.5 NewYork F 57.3 83.0 NewYork M 57.5 85.0 ;
Suppose you are interested in creating a design matrix that uses dummy variable coding for the categorical variables City
, G
and their interaction City
* G
. You can use the following PROC TRANSREG statements:
proc transreg data=categorical design; model class(city g city*g / zero=last); id x resp; output out=input_mcmc(drop=_: Int:); run;
The DESIGN option specifies that the primary goal is to code the design matrix. The MODEL statement indicates the variable
of interest. The CLASS option in the MODEL statement expands the variables of interest to a list of “dummy” variables. The
ZERO=LAST option sets the reference level. The ID statement includes x
and resp
in the OUT= data set. And the OUTPUT statement creates a new data set Input_MCMC
that stores the design matrix and original variables from the original data set.
A quick call of the PRINT procedure shows the output from the PROC TRANSREG call:
proc print data=input_mcmc; run;
Figure 59.15 prints the design matrix that is generated by PROC TRANSREG. The Input_mcmc
data set contains all the variables from the original Categorical
data set, in addition to corresponding dummy variables (CityChicago
, GF
, and CityChicagoGF
) for the categorical variables.
Figure 59.15: Design Matrix Generated by PROC TRANSREG
Create Design Matrix |
Obs | CityChicago | GF | CityChicagoGF | City | G | x | resp |
---|---|---|---|---|---|---|---|
1 | 1 | 1 | 1 | Chicago | F | 69.0 | 112.5 |
2 | 1 | 1 | 1 | Chicago | F | 56.5 | 84.0 |
3 | 1 | 0 | 0 | Chicago | M | 65.3 | 98.0 |
4 | 1 | 0 | 0 | Chicago | M | 59.8 | 84.5 |
5 | 0 | 0 | 0 | NewYork | M | 62.8 | 102.5 |
6 | 0 | 0 | 0 | NewYork | M | 63.5 | 102.5 |
7 | 0 | 1 | 0 | NewYork | F | 57.3 | 83.0 |
8 | 0 | 0 | 0 | NewYork | M | 57.5 | 85.0 |
You can now proceed to call PROC MCMC using this input data set Input_mcmc
and the corresponding dummy variables.
PROC TRANSREG automatically creates a macro variable, &_TRGIND, which contains a list of variable names that it creates. The
%put &_trgind;
statement prints the following:
CityChicago GF CityChicagoGF
The macro variable &_TRGIND can come handy if you want to build a regression model; you can refer to &_TRGIND in the following way:
proc mcmc data=input_mcmc; array data[5] 1 &_trgind x; array beta[5] beta0-beta4; ...; call mult(beta, data, mu); ...;
The first ARRAY statement defines a one-dimensional array of length 5, and it takes on five values: a constant 1 and variables CityChicago
, GF
, CityChicagoGF
, and x
. The second ARRAY statement defines an array of beta
, which are the model parameters. Later in the program, you can use the CALL MULT function to calculate the regression mean and store the value in the symbol mu
.