The MCMC Procedure

Functions of Random-Effects Parameters

Subsections:

Indexing Subject Variable
Non-Indexing Subject Variable

When you specify a RANDOM statement in a program, PROC MCMC internally creates a random-effects parameter for every unique value in the SUBJECT= variable. You can calculate any transformations of these random-effects parameters by applying SAS functions to the effect, and you can use the transformed variable in the subsequent statements. For example, the following statements perform a logit transformation of an effect:

random u ~ normal(mu, var=s2) subject=students;
p = logistic(u);
...

The value of the variable p changes with u as the procedure steps through the input data set: for different unique values of the students variable, u takes on a different parameter value, and p changes accordingly.

To save all the transformed values in p to the OUTPOST= data set, you cannot just specify the MONITOR=(p) option in the PROC MCMC statement. With such a specification, PROC MCMC can save only one value of p (usually the value associated with the last observation in the data set); it cannot save all values. To output all transformed values, you must create an array to store every transformation and use the MONITOR= option to save the entire array to the OUTPOST= data set. The difficult part of the programming involves the creation of the correct array index to use in different types of the SUBJECT= variables. The rest of this section describes how to monitor functions of random-effects parameters in different situations.

Indexing Subject Variable

This subsection describes how to monitor transformation of an effect u when the students variable is an indexing subject variable. An indexing subject variable is an integer variable that takes value from one to the total number of unique subjects in a variable. In other words, the variable can be used as an index in a SAS array. The indexing subject variable does not need to be sorted for the example code in this section to work. An example of an indexing variable takes the values of (1 2 3 4 5 1 2 3 4 5), where the total number of observation is n=10 and the number of unique values is m=5.

The following statements create an indexing variable students in the data set a:

data a;
   input students @@;
   datalines;
   1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10
;

The following statements run a random-effects model without any response variables. There are only random-effects parameters in the model; the program calculates the logit transformation of each effect, saves the results to the OUTPOST= data set, and produces Figure 61.21:

proc mcmc data=a monitor=(p) diag=none stats=none outpost=a1
   plots=none seed=1;
   array p[10];
   random u ~ n(0, sd=1) subject=students;
   p[students] = logistic(u);
   model general(0);
run;

proc print data=a1(obs=3);
run;

The ARRAY statement creates an array p of size 10, which is the number of unique values in students. The p array stores all the transformed values. The RANDOM statement declares u to be an effect with the subject variable students. The P[STUDENTS] assignment statement calculates the logit transformations of u and saves them in appropriate array elements—this is why the students variable must be an indexing variable. Because the students variable used in the p[] array is also the subject variable in the RANDOM statement, PROC MCMC can match each random-effects parameter with the corresponding element in array p. The MONITOR= option monitors all elements in p and saves the output to the a1 data set. The a1 data set contains variables p1–p10. Figure 61.21 shows the first three observations of the OUTPOST= data set.

Figure 61.21: Monitor Functions of Random Effect u

Obs	Iteration	p1	p2	p3	p4	p5	p6	p7	p8	p9	p10	u_1	u_2	u_3	u_4	u_5	u_6	u_7	u_8	u_9	u_10	LogReff	LogPost
1	1	0.5050	0.7334	0.5375	0.5871	0.6862	0.4944	0.5740	0.5991	0.6812	0.8678	0.0198	1.0120	0.1504	0.3521	0.7826	-0.0225	0.2983	0.4019	0.7595	1.8819	-12.2658	-12.2658
2	2	0.5563	0.4254	0.7260	0.5280	0.4593	0.5797	0.5813	0.7360	0.9079	0.3351	0.2261	-0.3006	0.9742	0.1120	-0.1630	0.3213	0.3281	1.0253	2.2887	-0.6854	-13.2393	-13.2393
3	3	0.6132	0.8031	0.4735	0.3135	0.7047	0.5273	0.6761	0.2379	0.2938	0.3986	0.4607	1.4058	-0.1060	-0.7837	0.8698	0.1092	0.7361	-1.1641	-0.8772	-0.4112	-12.3984	-12.3984

The variable p1 is the logit transformation of the variable u_1, p2 is the logit transformation of the variable u_2, and so on.

The same idea works for a students variable that is unsorted. The following statements create an unsorted indexing variable students with repeated measures in each subject, fit the same model, and produce Figure 61.22:

data a;
   input students @@;
   datalines;
   1 1 1 3 5 3 4 5 3 1 5 5 4 4 2 2 2 2 4 3
;
proc mcmc data=a monitor=(p) diag=none stats=none outpost=a1
   plots=none seed=1;
   array p[5];
   random u ~ n(0, sd=1) subject=students;
   p[students] = logistic(u);
   model general(0);
run;

proc print data=a1(obs=3);
run;

Figure 61.22: Monitor Functions of Random Effect u When the students Variable is Unsorted

Obs	Iteration	p1	p2	p3	p4	p5	u_1	u_3	u_5	u_4	u_2	LogReff	LogPost
1	1	0.5050	0.6862	0.7334	0.5871	0.5375	0.0198	1.0120	0.1504	0.3521	0.7826	-5.4865	-5.4865
2	2	0.4944	0.8678	0.5740	0.6812	0.5991	-0.0225	0.2983	0.4019	0.7595	1.8819	-6.7793	-6.7793
3	3	0.5563	0.4593	0.4254	0.5280	0.7260	0.2261	-0.3006	0.9742	0.1120	-0.1630	-5.1595	-5.1595

There are five random-effects parameters in this example, and the array p also has five elements. The values p1–p5 are the transformations of u_1–u_5, respectively. The u variables are not sorted from u_1 to u_5 because PROC MCMC creates the names according to the order by which the subject variable appears in the input data set. Nevertheless, because students is an indexing variable, the first element p[1] stores the transformation that corresponds to students=1 (which is u_1), the second element p[2] stores the transformation that corresponds to students=2, and so on.

Non-Indexing Subject Variable

A non-indexing subject variable can take values of character literals (for example, names) or numerals (for example, ZIP code or a person’s weight). This section illustrates how to monitor functions of random-effects parameters in these situations.

Suppose you have unsorted character literals as the subject variable:

data a;
   input students$ @@;
   datalines;
   smith john john mary kay smith lizzy ben ben dylan
   ben toby abby mary kay kay lizzy ben dylan mary
;

A statement such as following does not work anymore because a character variable cannot be used as an array index:

p[students] = logistic(u);

In this situation, you usually need to do two things: (1) find out the number of unique values in the subject variable, and (2) create a numeric index variable that replaces the students array index. You can use the following statements to do the first task:

proc sql noprint;
   select count(distinct(students)) into :nuniq from a;
quit;
%put &nuniq;

The PROC SQL call counts the distinct values in the students variable and saves the count to the macro variable &nuniq. The macro variable is used later to specify the element size of the p array. In this example, the a data set contains 20 observations and 9 unique elements (the value of &nuinq).

The following statements create an Index variable in the a data set that is in the order by which the students names appear in the data set:

proc freq data=a order=data noprint;
   tables students / out=_f(keep=students);
run;

proc print data=_f;
run;

data a(drop=n);
   set a;
   do i = 1 to nobs until(students=n);
      set _f(keep=students rename=(students=n)) point=i nobs=nobs;
      Index = i;
   end;
run;

proc print data=a;
run;

The PROC FREQ call identifies the unique students names and saves them to the _f data set, which is displayed in Figure 61.23.

Figure 61.23: Unique Names in the Variable Students

Obs	students
1	smith
2	john
3	mary
4	kay
5	lizzy
6	ben
7	dylan
8	toby
9	abby

The DATA step steps through the a data set and creates an Index variable to match the order in which the students names appear in the data set. The new a data set^[35] is displayed in Figure 61.24.

Figure 61.24: New Index Variable in the a Data Set

Obs	students	Index
1	smith	1
2	john	2
3	john	2
4	mary	3
5	kay	4
6	smith	1
7	lizzy	5
8	ben	6
9	ben	6
10	dylan	7
11	ben	6
12	toby	8
13	abby	9
14	mary	3
15	kay	4
16	kay	4
17	lizzy	5
18	ben	6
19	dylan	7
20	mary	3

Student smith is the first subject, and his Index value is one. The same student appears again in the sixth observation, which is given the same Index value. Now, this Index variable can be used to index the p array, in a similar fashion as demonstrated in previous programs:

data _f;
   set _f;
   subj = compress('p_'||students);
run;
proc sql noprint;
   select subj into :pnames separated by ' ' from _f;
quit;
%put &pnames;

proc mcmc data=a monitor=(p) diag=none stats=none outpost=a1
   plots=none seed=1;
   array p[&nuniq] &pnames;
   random u ~ n(0, sd=1) subject=students;
   p[index] = logistic(u);
   model general(0);
run;

proc print data=a1(obs=3);
run;

The first part of the DATA step and the PROC SQL call create array names for the p array that match the subject names in the students variable. It is not necessary to include these steps for the PROC MCMC program to work, but it makes the output more readable. The first DATA step steps through the _f data set and creates a subj variable that concatenates the prefix characters p_ with the names of the students. The PROC SQL calls put all the subj values to a macro called &pnames, which looks like the following:

p_smith p_john p_mary p_kay p_lizzy p_ben p_dylan p_toby p_abby

In the PROC MCMC program, the ARRAY statement defines an p array with size &nuniq (9), and use the macro variable &pnames to name the array elements. The P[INDEX] assignment statement uses the Index variable to find the correct array element to store the transformation. Figure 61.25 displays the first few observations of the OUTPOST=a1 data set.

Figure 61.25: First Few Observations of the Outpost Data Set

Obs	Iteration	p_smith	p_john	p_mary	p_kay	p_lizzy	p_ben	p_dylan	p_toby	p_abby	u_smith	u_john	u_mary	u_kay	u_lizzy	u_ben	u_dylan	u_toby	u_abby	LogReff	LogPost
1	1	0.5050	0.7334	0.5375	0.5871	0.6862	0.4944	0.5740	0.5991	0.6812	0.0198	1.0120	0.1504	0.3521	0.7826	-0.0225	0.2983	0.4019	0.7595	-9.5761	-9.5761
2	2	0.8678	0.5563	0.4254	0.7260	0.5280	0.4593	0.5797	0.5813	0.7360	1.8819	0.2261	-0.3006	0.9742	0.1120	-0.1630	0.3213	0.3281	1.0253	-11.2370	-11.2370
3	3	0.9079	0.3351	0.6132	0.8031	0.4735	0.3135	0.7047	0.5273	0.6761	2.2887	-0.6854	0.4607	1.4058	-0.1060	-0.7837	0.8698	0.1092	0.7361	-13.1867	-13.1867

There are nine random-effects parameters (u_smith, u_john, and so on). There are nine elements of the p array (p_smith, p_john, and so on); each is the logit transformation of corresponding u elements.

You can use the same statements for subject variables that are numeric non-indexing variables. The following statements create a students variable that take large numbers that cannot be used as indices to an array. The rest of the program monitors functions of the effect u. The output is not displayed here.

data a;
   call streaminit(1);
   do i = 1 to 20;
      students = rand("poisson", 20);
      output;
   end;
   drop i;
run;

proc sql noprint;
   select count(distinct(students)) into :nuniq from a;
quit;
%put &nuniq;

proc freq data=a order=data noprint;
   tables students / out=_f(keep=students);
run;

data a(drop=n);
   set a;
   do i = 1 to nobs until(students=n);
      set _f(keep=students rename=(students=n)) point=i nobs=nobs;
      Index = i;
   end;
run;

data _f;
   set _f;
   subj = compress('p_'||students);
proc sql noprint;
   select subj into :pnames separated by ' ' from _f;
   quit;
%put &pnames;

proc mcmc data=a monitor=(p) diag=none stats=none outpost=a1
   plots=none seed=1;
   array p[&nuniq] &pnames;
   random u ~ n(0, sd=1) subject=students;
   p[index] = logistic(u);
   model general(0);
run;

proc print data=a1(obs=3);
run;

^[35]The programming code that creates and adds the Index variable to the data set a keeps all variables from the original data set and does not discard them.