When you specify a RANDOM statement in a program, PROC MCMC internally creates a random-effects parameter for every unique value in the SUBJECT= variable. You can calculate any transformations of these random-effects parameters by applying SAS functions to the effect, and you can use the transformed variable in the subsequent statements. For example, the following statements perform a logit transformation of an effect:
random u ~ normal(mu, var=s2) subject=students; p = logistic(u); ...
The value of the variable p
changes with u
as the procedure steps through the input data set: for different unique values of the students
variable, u
takes on a different parameter value, and p
changes accordingly.
To save all the transformed values in p
to the OUTPOST=
data set, you cannot just specify the MONITOR=(p)
option in the PROC MCMC statement. With such a specification, PROC MCMC can save only one value of p
(usually the value associated with the last observation in the data set); it cannot save all values. To output all transformed
values, you must create an array to store every transformation and use the MONITOR=
option to save the entire array to the OUTPOST=
data set. The difficult part of the programming involves the creation of the correct array index to use in different types
of the SUBJECT=
variables. The rest of this section describes how to monitor functions of random-effects parameters in different situations.
This subsection describes how to monitor transformation of an effect u
when the students
variable is an indexing subject variable. An indexing subject variable is an integer variable that takes value from one to
the total number of unique subjects in a variable. In other words, the variable can be used as an index in a SAS array. The
indexing subject variable does not need to be sorted for the example code in this section to work. An example of an indexing
variable takes the values of (1 2 3 4 5 1 2 3 4 5
), where the total number of observation is n=10 and the number of unique values is m=5.
The following statements create an indexing variable students
in the data set a
:
data a; input students @@; datalines; 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 ;
The following statements run a random-effects model without any response variables. There are only random-effects parameters in the model; the program calculates the logit transformation of each effect, saves the results to the OUTPOST= data set, and produces Figure 73.22:
proc mcmc data=a monitor=(p) diag=none stats=none outpost=a1 plots=none seed=1; array p[10]; random u ~ n(0, sd=1) subject=students; p[students] = logistic(u); model general(0); run; proc print data=a1(obs=3); run;
The ARRAY
statement creates an array p
of size 10, which is the number of unique values in students
. The p
array stores all the transformed values. The RANDOM
statement declares u
to be an effect with the subject variable students
. The P[STUDENTS] assignment statement calculates the logit transformations of u
and saves them in appropriate array elements—this is why the students
variable must be an indexing variable. Because the students
variable used in the p[]
array is also the subject variable in the RANDOM
statement, PROC MCMC can match each random-effects parameter with the corresponding element in array p
. The MONITOR=
option monitors all elements in p
and saves the output to the a1
data set. The a1
data set contains variables p1
–p10
. Figure 73.22 shows the first three observations of the OUTPOST=
data set.
Figure 73.22: Monitor Functions of Random Effect u
Obs | Iteration | p1 | p2 | p3 | p4 | p5 | p6 | p7 | p8 | p9 | p10 | u_1 | u_2 | u_3 | u_4 | u_5 | u_6 | u_7 | u_8 | u_9 | u_10 | LogReff | LogLike | LogPost |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 0.5050 | 0.7334 | 0.5375 | 0.5871 | 0.6862 | 0.4944 | 0.5740 | 0.5991 | 0.6812 | 0.8678 | 0.0198 | 1.0120 | 0.1504 | 0.3521 | 0.7826 | -0.0225 | 0.2983 | 0.4019 | 0.7595 | 1.8819 | -12.2658 | 0 | -12.2658 |
2 | 2 | 0.5563 | 0.4254 | 0.7260 | 0.5280 | 0.4593 | 0.5797 | 0.5813 | 0.7360 | 0.9079 | 0.3351 | 0.2261 | -0.3006 | 0.9742 | 0.1120 | -0.1630 | 0.3213 | 0.3281 | 1.0253 | 2.2887 | -0.6854 | -13.2393 | 0 | -13.2393 |
3 | 3 | 0.6132 | 0.8031 | 0.4735 | 0.3135 | 0.7047 | 0.5273 | 0.6761 | 0.2379 | 0.2938 | 0.3986 | 0.4607 | 1.4058 | -0.1060 | -0.7837 | 0.8698 | 0.1092 | 0.7361 | -1.1641 | -0.8772 | -0.4112 | -12.3984 | 0 | -12.3984 |
The variable p1
is the logit transformation of the variable u_1
, p2
is the logit transformation of the variable u_2
, and so on.
The same idea works for a students
variable that is unsorted. The following statements create an unsorted indexing variable students
with repeated measures in each subject, fit the same model, and produce Figure 73.23:
data a; input students @@; datalines; 1 1 1 3 5 3 4 5 3 1 5 5 4 4 2 2 2 2 4 3 ; proc mcmc data=a monitor=(p) diag=none stats=none outpost=a1 plots=none seed=1; array p[5]; random u ~ n(0, sd=1) subject=students; p[students] = logistic(u); model general(0); run; proc print data=a1(obs=3); run;
Figure 73.23: Monitor Functions of Random Effect u
When the students
Variable is Unsorted
Obs | Iteration | p1 | p2 | p3 | p4 | p5 | u_1 | u_3 | u_5 | u_4 | u_2 | LogReff | LogLike | LogPost |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 0.5050 | 0.6862 | 0.7334 | 0.5871 | 0.5375 | 0.0198 | 1.0120 | 0.1504 | 0.3521 | 0.7826 | -5.4865 | 0 | -5.4865 |
2 | 2 | 0.4944 | 0.8678 | 0.5740 | 0.6812 | 0.5991 | -0.0225 | 0.2983 | 0.4019 | 0.7595 | 1.8819 | -6.7793 | 0 | -6.7793 |
3 | 3 | 0.5563 | 0.4593 | 0.4254 | 0.5280 | 0.7260 | 0.2261 | -0.3006 | 0.9742 | 0.1120 | -0.1630 | -5.1595 | 0 | -5.1595 |
There are five random-effects parameters in this example, and the array p
also has five elements. The values p1
–p5
are the transformations of u_1
–u_5
, respectively. The u
variables are not sorted from u_1
to u_5
because PROC MCMC creates the names according to the order by which the subject variable appears in the input data set. Nevertheless,
because students
is an indexing variable, the first element p[1]
stores the transformation that corresponds to students=1
(which is u_1
), the second element p[2]
stores the transformation that corresponds to students=2
, and so on.
A non-indexing subject variable can take values of character literals (for example, names) or numerals (for example, ZIP code or a person’s weight). This section illustrates how to monitor functions of random-effects parameters in these situations.
Suppose you have unsorted character literals as the subject variable:
data a; input students$ @@; datalines; smith john john mary kay smith lizzy ben ben dylan ben toby abby mary kay kay lizzy ben dylan mary ;
A statement such as following does not work anymore because a character variable cannot be used as an array index:
p[students] = logistic(u);
In this situation, you usually need to do two things: (1) find out the number of unique values in the subject variable, and
(2) create a numeric index variable that replaces the students
array index. You can use the following statements to do the first task:
proc sql noprint; select count(distinct(students)) into :nuniq from a; quit; %put &nuniq;
The PROC SQL call counts the distinct values in the students
variable and saves the count to the macro variable &nuniq
. The macro variable is used later to specify the element size of the p
array. In this example, the a
data set contains 20 observations and 9 unique elements (the value of &nuinq
).
The following statements create an Index
variable in the a
data set that is in the order by which the students
names appear in the data set:
proc freq data=a order=data noprint; tables students / out=_f(keep=students); run; proc print data=_f; run; data a(drop=n); set a; do i = 1 to nobs until(students=n); set _f(keep=students rename=(students=n)) point=i nobs=nobs; Index = i; end; run; proc print data=a; run;
The PROC FREQ call identifies the unique students
names and saves them to the _f
data set, which is displayed in Figure 73.24.
Figure 73.24: Unique Names in the Variable Students
The DATA step steps through the a
data set and creates an Index
variable to match the order in which the students
names appear in the data set. The new a
data set^{[36]} is displayed in Figure 73.25.
Figure 73.25: New Index
Variable in the a
Data Set
Student smith
is the first subject, and his Index
value is one. The same student appears again in the sixth observation, which is given the same Index
value. Now, this Index
variable can be used to index the p
array, in a similar fashion as demonstrated in previous programs:
data _f; set _f; subj = compress('p_'||students); run; proc sql noprint; select subj into :pnames separated by ' ' from _f; quit; %put &pnames; proc mcmc data=a monitor=(p) diag=none stats=none outpost=a1 plots=none seed=1; array p[&nuniq] &pnames; random u ~ n(0, sd=1) subject=students; p[index] = logistic(u); model general(0); run; proc print data=a1(obs=3); run;
The first part of the DATA step and the PROC SQL call create array names for the p
array that match the subject names in the students
variable. It is not necessary to include these steps for the PROC MCMC program to work, but it makes the output more readable.
The first DATA step steps through the _f
data set and creates a subj
variable that concatenates the prefix characters p_
with the names of the students
. The PROC SQL calls put all the subj
values to a macro called &pnames
, which looks like the following:
p_smith p_john p_mary p_kay p_lizzy p_ben p_dylan p_toby p_abby
In the PROC MCMC program, the ARRAY
statement defines an p
array with size &nuniq
(9), and use the macro variable &pnames
to name the array elements. The P[INDEX] assignment statement uses the Index
variable to find the correct array element to store the transformation. Figure 73.26 displays the first few observations of the OUTPOST=a1
data set.
Figure 73.26: First Few Observations of the Outpost Data Set
Obs | Iteration | p_smith | p_john | p_mary | p_kay | p_lizzy | p_ben | p_dylan | p_toby | p_abby | u_smith | u_john | u_mary | u_kay | u_lizzy | u_ben | u_dylan | u_toby | u_abby | LogReff | LogLike | LogPost |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 0.5050 | 0.7334 | 0.5375 | 0.5871 | 0.6862 | 0.4944 | 0.5740 | 0.5991 | 0.6812 | 0.0198 | 1.0120 | 0.1504 | 0.3521 | 0.7826 | -0.0225 | 0.2983 | 0.4019 | 0.7595 | -9.5761 | 0 | -9.5761 |
2 | 2 | 0.8678 | 0.5563 | 0.4254 | 0.7260 | 0.5280 | 0.4593 | 0.5797 | 0.5813 | 0.7360 | 1.8819 | 0.2261 | -0.3006 | 0.9742 | 0.1120 | -0.1630 | 0.3213 | 0.3281 | 1.0253 | -11.2370 | 0 | -11.2370 |
3 | 3 | 0.9079 | 0.3351 | 0.6132 | 0.8031 | 0.4735 | 0.3135 | 0.7047 | 0.5273 | 0.6761 | 2.2887 | -0.6854 | 0.4607 | 1.4058 | -0.1060 | -0.7837 | 0.8698 | 0.1092 | 0.7361 | -13.1867 | 0 | -13.1867 |
There are nine random-effects parameters (u_smith
, u_john
, and so on). There are nine elements of the p
array (p_smith
, p_john
, and so on); each is the logit transformation of corresponding u
elements.
You can use the same statements for subject variables that are numeric non-indexing variables. The following statements create
a students
variable that take large numbers that cannot be used as indices to an array. The rest of the program monitors functions of
the effect u
. The output is not displayed here.
data a; call streaminit(1); do i = 1 to 20; students = rand("poisson", 20); output; end; drop i; run; proc sql noprint; select count(distinct(students)) into :nuniq from a; quit; %put &nuniq; proc freq data=a order=data noprint; tables students / out=_f(keep=students); run;
data a(drop=n); set a; do i = 1 to nobs until(students=n); set _f(keep=students rename=(students=n)) point=i nobs=nobs; Index = i; end; run; data _f; set _f; subj = compress('p_'||students); proc sql noprint; select subj into :pnames separated by ' ' from _f; quit; %put &pnames; proc mcmc data=a monitor=(p) diag=none stats=none outpost=a1 plots=none seed=1; array p[&nuniq] &pnames; random u ~ n(0, sd=1) subject=students; p[index] = logistic(u); model general(0); run; proc print data=a1(obs=3); run;
^{[36] }The programming code that creates and adds the Index
variable to the data set a
keeps all variables from the original data set and does not discard them.