FOCUS AREAS


Contents | SAS Program | PDF

Searching for Resolvable Block Designs with PROC OPTEX

Introduction

A design is resolvable when multiple blocks can be combined in such a way that each treatment occurs equally often in each collection of blocks. If each treatment occurs $\alpha $ times in each such collection of blocks, then each collection is said to contain an $\alpha $-replicate of the treatments, or simply a replicate if $\alpha =1$. Resolvability is a desirable feature of a design because it ensures orthogonality between treatments and nuisance factors that might be of concern. For example, resolvability in sequential experimentation, where replicates correspond to time periods, is used to mitigate time effects. Resolvability can likewise be useful in multisite experiments and in experiments where multiple individuals handle experimental runs (Morgan and Reck, 2007). Resolvability is often a critical feature of how an experiment is performed, perhaps because treatments are available in equal-sized replicate batches that must be used up before another batch can be supplied.

All You Really Need to Know

Suppose you are a food researcher who is studying wine preferences. You want your panel of 20 wine experts to conduct taste tests of 50 varieties of wine. Obviously, they cannot taste all 50 wines in the same session, at least not if you want to trust the ratings of the last dozen or so. So you propose to present the wines to them in three sessions of 17, 17, and 16 wines apiece. You have two goals for the design:

  1. Each expert should judge each wine once.

  2. Each pair of wines should occur in the same session in as balanced a way as possible.

The following SAS statements invoke the OPTEX procedure from SAS/QC software to construct a good experimental design for this study. You create two data sets that, respectively, describe the wine varieties that you are studying and the expert/session setup that you have decided on. The OPTEX procedure’s syntax describes the roles of these data sets in defining the "treatment" and "block" models, respectively, and the PRIOR= option in the second MODEL statement is the essential feature that enables the resulting arrangement to satisfy the two goals. The %RDBEval macro, which is documented in Appendix: The %RBDEval Macro Syntax, evaluates the resulting design for efficiency with respect to replicates and blocks within replicates. A D-efficiency rating of 100% with respect to replicates indicates that the design is resolvable. The D-efficiency rating with respect to blocks within replicates indicates how well balanced the design is (see the section "Optimality Criteria" in the chapter "The OPTEX Procedure" of the SAS/QC User's Guide for a discussion of the D-optimality criteria).

data Wines;
   do Wine = 1 to 50;
      output;
   end;
run;

data Setup;
   do Subject = 1 to 20;
      do Session = 1 to  3;
         if (Session < 3) then nPlot = 17;
         else                  nPlot = 16;
         do Plot = 1 to nPlot;
            output;
         end;
      end;
   end;
   drop nPlot;
run;
proc optex data=Wines coding=orthcan seed=16899;
   class Wine;
   model Wine;
   block design=Setup;
   class Subject Session;
   model Subject, Session(Subject) / prior=0,10;
   output out=Design;
run;
%RBDEval(Design,Wine,Subject,Session,nCheck=10);

Figure 1 shows that the best design found has a treatment D-efficiency of 96.58.

Figure 1: Design Criteria for Replicates and Blocks within Replicates

The OPTEX Procedure

Design Number Treatment
D-Efficiency
Treatment
A-Efficiency
1 96.5760 96.5530
2 96.5759 96.5529
3 96.5756 96.5523
4 96.5756 96.5521
5 96.5751 96.5513
6 96.5750 96.5510
7 96.5749 96.5508
8 96.5748 96.5506
9 96.5745 96.5501
10 96.5744 96.5498


Figure 2 shows that the resulting design has a block design D-efficiency of 100%, so the design is resolvable and satisfies the first goal precisely. It also shows that the design has a raw block-within-replicates D-efficiency of 95.88 and a relative block-within-replicates D-efficiency of 99.98. The relative efficiency measures compare the raw efficiency measures to the efficiency measures from the best design that PROC OPTEX can find by using the number of search iterations that are specified in the nCheck= argument. The first row compares the raw measures to those for the best design found that contains replicates only. The second row compares the raw measures to the best design found that contains blocks only. By construction, the design in this example cannot be completely balanced, so the second goal can be only approximately satisfied. However, PROC OPTEX finds a design that is fairly close to being balanced. Each pair of wines is tasted together between 23 and 30 times, with most pairs being tasted together 26–27 times.

Figure 2: Design Efficiency Evaluation

Wine Efficiency for Subject and Session-within-Subject
Evaluation Raw Relative
D A D A
Subject 100.0000 100.0000 100.0000 100.0000
Session(Subject) 95.8849 95.8516 99.9773 99.9548


The remainder of this paper reviews the OPTEX procedure and explains why this method works.

Resolvable Designs

Table 1 shows a resolvable block design for four treatments in six blocks of size 2; the pairs of blocks in each row of the table make up a single replicate of the treatments. Because the blocks of this design consist of all pairs of treatments, it is also balanced, and therefore it is a resolvable balanced incomplete block design.

Table 1: A Resolvable Balanced Incomplete Block Design

Block

Replicate

1

2

1

[1 2]

[3 4]

2

[1 3]

[2 4]

3

[1 4]

[2 3]


Resolvable balanced incomplete block designs are D-optimal with respect to two different kinds of blocks: the primary blocks and the replicates, considered as blocks. To confirm this, the following statements use the %RBDEval macro to evaluate the arrangement of treatments that is shown in Table 1, both as a design for six blocks of size 2 and as a design for three blocks of size 4.

The following DATA steps create two data sets: Candidates and Design. Candidates contains the candidate points for the design, and Design contains a set of points for the design that is to be evaluated for D-efficiency.

data Candidates;
   do Treatment = 1 to 4;
      output;
   end;
run;
data Design;
   do Replicate = 1 to 3;
      do Block = 1 to 2;
         do Plot = 1 to 2;
            input Treatment @@;
            output;
         end;
      end;
   end;
datalines;
1 2   3 4
1 3   2 4
1 4   2 3
;
%RBDEval(Design,Treatment,Replicate,Block,nCheck=10);

Figure 3 displays the results.

Figure 3: Design Efficiency Evaluation

Treatment Efficiency for Replicate and Block-within-Replicate
Evaluation Raw Relative
D A BD D A BD
Replicate 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000
Block(Replicate) 66.6667 66.6667 100.0000 100.0000 100.0000 100.0000


As claimed, the design has 100% block design D-efficiency with respect to both block structures. The design also has 100% treatment D-efficiency with respect to replicates because each replicate contains each treatment once.

Searching for a Resolvable Design

You can use PROC OPTEX to search for good resolvable block designs, but doing so requires a little trick. The DESIGN= option in the BLOCK statement enables you to specify a block structure that involves both replicates and blocks within replicates, but this is not enough. To see why, consider the following PROC OPTEX example, which sets up a candidate set of 15 treatments and a block structure data set that consists of seven replicates of five blocks of size 3, and then uses PROC OPTEX to find the best way to assign these 15 treatments to such a block structure, considering effects for both replicates and block within replicates.

The following SAS statements create two data sets, Candidates and BlockStructure. Candidates contains the candidate points for the design, and BlockStructure contains a block structure that involves both replicates and blocks within replicates.


data Candidates;
   do Treatment = 1 to 15;
      output;
   end;
run;
data BlockStructure;
   do Replicate = 1 to 7;
      do Block = 1 to 5;
         do Plot = 1 to 3;
            output;
         end;
      end;
   end;
run;

The following statements use PROC OPTEX to search for a good design for the candidate points that are stored in the data set Candidates by using the blocking structure that is contained in the data set BlockStructure:

proc optex data=Candidates coding=orthcan seed=492069001;
   class Treatment;
   model Treatment;
   blocks design=BlockStructure;
   class Replicate Block;
   model Replicate Block(Replicate);
   output out=Design;
   ods select BlockDesignEfficiencies;
run;

The DATA= option in the PROC OPTEX statement specifies that the data set Candidates contains the candidate points for the design. The SEED= option specifies an integer to use to start the pseudorandom number generator for initialization and guarantees reproducibility. The first CLASS statement refers to the data set Candidates and specifies Treatment as a classification variable. The first MODEL statement represents the treatment model; it specifies that the model consists of the main treatment effects. The DESIGN= option in the BLOCKS statement specifies that BlockStructure contains the fixed covariates for the model. In this case, the covariates are the replicates and the blocks. The CLASS statement that follows the BLOCKS statement refers to the data set BlockStructure and the model for the fixed covariates and specifies Replicate and Block as classification variables. Similarly, the MODEL statement that follows the BLOCKS statement refers to the data set BlockStructure and the model for the fixed covariates; it specifies fixed effects for Replicate and Block nested within Replicate. The OUTPUT statement saves the best design in the data set Design, which is specified in the OUT= option.

Figure 4 shows that the best design found has a treatment D-efficiency of 71.43.

Figure 4: Design Criteria for Replicates and Blocks within Replicates

The OPTEX Procedure

Design Number Treatment
D-Efficiency
Treatment
A-Efficiency
1 71.4286 71.4286
2 71.4286 71.4286
3 71.4286 71.4286
4 71.3371 71.2444
5 71.3371 71.2444
6 71.3371 71.2444
7 71.3371 71.2444
8 71.3371 71.2444
9 71.3371 71.2444
10 71.2004 70.9710


The following statement calls the %RBDEval macro, which evaluates the design:


%RBDEval(Design,Treatment,Replicate,Block,nCheck=10);

The resulting efficiency measures for the two different blocking structures are shown in Figure 5. The design has a block design D-efficiency with respect to replicates of 97.3. The design is not resolvable because it is not 100% efficient with respect to replicates. The design has a relative block design D-efficiency with respect to blocks within replicates of 100%, indicating that the design achieves balance with respect to blocks within replicates.

Figure 5: Design Criteria for Replicates and Blocks within Replicates

Treatment Efficiency for Replicate and Block-within-Replicate
Evaluation Raw Relative
D A BD D A BD
Replicate 97.3299 97.2418 97.3299 97.3299 97.2418 97.3299
Block(Replicate) 71.4286 71.4286 100.0000 100.0000 100.0000 100.0000


The following PROC FREQ step confirms that some treatments occur twice in some replicates and other treatments do not occur at all:

proc freq data=Design;
   table Treatment*Replicate / norow nocol nopct nocum;
run;

The resulting coincidence counts are shown in Figure 6.

Figure 6: Treatment-by-Replicate Coincidence Counts for Optimal Design

The FREQ Procedure

Frequency
Table of Treatment by Replicate
Treatment Replicate
1 2 3 4 5 6 7 Total
1
1
0
0
2
2
1
1
7
2
2
1
1
0
1
1
1
7
3
1
1
0
2
1
1
1
7
4
0
2
1
1
1
1
1
7
5
1
0
2
1
0
0
3
7
6
2
1
2
0
1
1
0
7
7
1
1
1
0
2
1
1
7
8
0
2
1
1
1
1
1
7
9
1
1
1
1
1
1
1
7
10
2
0
1
1
1
1
1
7
11
0
2
1
1
1
1
1
7
12
2
1
1
1
1
1
0
7
13
0
1
2
1
1
1
1
7
14
1
1
1
1
0
2
1
7
15
1
1
0
2
1
1
1
7
Total
15
15
15
15
15
15
15
105


The reason why this optimal blocking approach fails to find a resolvable design is that PROC OPTEX can optimize only one definition of block D-efficiency, which is $|X’AX|$, at a time, and the blocking model that is used here defines the matrix A based only on the effect of blocks within replicates. Call this matrix $A_ B$, and call the corresponding matrix for replicates considered as blocks $A_ R$. Technically, $A_ R$ and $A_ B$ are defined as the projectors onto the residual spaces for the models with effects only for replicates and only for blocks within replicates, respectively. To find a design that is both resolvable and balanced, you need to maximize the determinant of an information matrix that combines these two separate information matrices:

\[  D^\alpha = | \alpha X’A_ RX + (1-\alpha ) X’A_ BX |,~ ~ ~  0 < \alpha < 1  \]

It can be shown that if a resolvable design exists, it maximizes such a criterion.

This is where the trick comes in. The PRIOR= option in the MODEL statement can be used with the BLOCKS model and with the treatment model. For a block design whose blocks and replicates are of equal sizes, a block model of the form

class Replicate Block; model Replicate, Block(Replicate) / prior=0,$ \pi $;

defines a block D-efficiency of the form shown earlier, where

\begin{eqnarray*}  \alpha &  = &  \frac{\pi }{N + \pi } \end{eqnarray*}

where $N$ is the size of the design. That is, a resolvable block design is Bayes optimal in the sense of DuMouchel and Jones (1994). The intuitive interpretation of this result is that, by claiming a certain amount of prior information about blocks within replicates, you are freeing PROC OPTEX to try to find a design that contains information about replicates, too.

The following statements exploit this method to find a resolvable balanced incomplete block design for 15 treatments in seven replicates of five blocks of size 3, by claiming about 100 observations’ worth of prior information about the blocks-within-replicates effect:

proc optex data=Candidates coding=orthcan seed=607104001;
   class Treatment;
   model Treatment;
   blocks design=BlockStructure niter=10000 keep=10;
   class Replicate Block;
   model Replicate, Block(Replicate) / prior=0,100;
   output out=Design;
   ods select BlockDesignEfficiencies;
run;
%RBDEval(Design,Treatment,Replicate,Block,nCheck=10);

Notice how high the NITER= option value is set in the BLOCKS statement. Combinatorial designs of any sort can be difficult for PROC OPTEX to find, so the more iterations you allow, the better your odds of finding the theoretical optimum. On the other hand, PROC OPTEX invariably finds a pretty good design, if not the very best possible design. When you use the PRIOR= option, you must use commas to separate groups of effects in the MODEL statement that have the same prior precision. This example has two groups: Replicate and Block(Replicate). The PRIOR= option specifies a prior precision value of 0 for Replicate and a prior precision value of 100 for Block(Replicate).

Figure 7 shows that the best design found by using the prior information has a treatment D-efficiency rating of 96.27.

Figure 7: Design Criteria for Bayesian Design

The OPTEX Procedure

Design Number Treatment
D-Efficiency
Treatment
A-Efficiency
1 96.2733 96.2733
2 96.2733 96.2733
3 96.2716 96.2699
4 96.2710 96.2687
5 96.2710 96.2687
6 96.2710 96.2687
7 96.2710 96.2687
8 96.2710 96.2687
9 96.2710 96.2687
10 96.2710 96.2687


Figure 8 shows that the design achieves a 100% block design D-efficiency with respect to replicates, indicating that the design is resolvable. The design also achieves a 100% block design D-efficiency with respect to blocks within replicates, indicating that the design is balanced.

Figure 8: Optimal Design Evaluated for Replicates

Treatment Efficiency for Replicate and Block-within-Replicate
Evaluation Raw Relative
D A BD D A BD
Replicate 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000
Block(Replicate) 71.4286 71.4286 100.0000 100.0000 100.0000 100.0000


A question that naturally arises when you use this approach is, What value of $\pi $ should you use for the nonzero prior on blocks within replicates? Mathematically, a resolvable design is optimal for any value of $\pi $, but if you set it too small or too large, one or the other of the component information matrices will dominate the efficiency criterion. It doesn’t seem possible to give any definitive advice on this point, other than to try several values. You want a prior high enough to ensure resolvability, but not so high that PROC OPTEX fails to find the most efficient resolvable design for the situation. You might, for example, start with $\pi = N/10$, one tenth the size of the design. If the resulting design is not resolvable, increase $\pi $ to, for example, $C\cdot N/10$, where $2 \leq C \leq 10$. When PROC OPTEX finds a resolvable design, it is usually highly efficient compared to the most efficient possible resolvable design for the specific situation. Nevertheless, you can try reducing $\pi $ by, for example, half, to see whether PROC OPTEX can find a resolvable design that has higher efficiency.

Contributors

Cliff Pereira and Luna Sun of Oregon State University provided the applications that motivated this work and a practical review of the methodology. Randy Tobias, the developer of PROC OPTEX and the %RDBEval macro, provided the optimal design expertise.

Appendix: The %RBDEval Macro Syntax

The %RDBEval macro evaluates an experimental design for efficiency with respect to replicates and blocks within replicates by using PROC OPTEX. When calling %RBDEval, you must specify a SAS data set that contains the design to be evaluated, and three variables that identify the treatments, replicates, and blocks, respectively.

%RBDEval Macro Syntax

%RBDEval(Design, vName, rName, bName, nCheck=)

Required Arguments

Design SAS-data-set

specifies a SAS data set that contains the design.

vName variable

specifies the variable that indexes the treatments.

rName variable

specifies the variable that indexes the replicates.

bName variable <number>

specifies the variable that indexes the blocks.

Optional Argument

nCheck=number

requests a search for optimal replicate-only and blocks-only designs, respectively, for the purpose of computing the relative efficiencies of a design; and specifies the number of times to repeat the search from different initial designs. The default value is 0, which indicates that no search is to be performed and that the relative efficiencies are not to be computed.

The following example uses the design from Table 1 and demonstrates how to use the %RBDEval macro, first omitting the nCheck= argument, and then by specifying the nCheck= argument:

data Design;
   do Replicate = 1 to 3;
      do Block = 1 to 2;
         do Plot = 1 to 2;
            input Treatment @@;
            output;
         end;
      end;
   end;
datalines;
1 2   3 4
1 3   2 4
1 4   2 3
;
%RBDEval(Design,Treatment,Replicate,Block);

Figure 9 shows that when you do not specify the nCheck= argument, %RBDEval displays only the raw efficiency measures and, if appropriate, the block design efficiency measures. The block design measures are displayed if, as in this case, the replicates and blocks are balanced, meaning that there are an equal number of blocks in every replicate and an equal number of plots in every block. The block design efficiency measures compare the raw D-efficiency measures to those that optimal balanced designs would theoretically have, if they exist.

Figure 9: Design Efficiency Evaluation

Treatment Efficiency for Replicate and Block-within-Replicate
Evaluation D A BD
Replicate 100.0000 100.0000 100.0000
Block(Replicate) 66.6667 66.6667 100.0000


The following statement calls the %RBDEval macro and specifies the nCheck= argument:

%RBDEval(Design,Treatment,Replicate,Block,nCheck=10);

Figure 10 shows that when you do specify the nCheck= argument (with a nonzero value for the argument), the %RBDEval macro displays both the raw and relative efficiency measures. The relative efficiency measures compare the raw efficiency measures to the efficiency measures from the best designs that PROC OPTEX can find by using the number of search iterations that you specify in the nCheck= argument. The first row compares the raw measures to those for the best design found that contains replicates only. The second row compares the raw measures to the best design found that contains blocks only.

Figure 10: Design Efficiency Evaluation

Treatment Efficiency for Replicate and Block-within-Replicate
Evaluation Raw Relative
D A BD D A BD
Replicate 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000
Block(Replicate) 66.6667 66.6667 100.0000 100.0000 100.0000 100.0000


References

  • DuMouchel, W. and Jones, B. (1994), “A Simple Bayesian Modification of D-Optimal Designs to Reduce Dependence on an Assumed Model,” Technometrics, 36, 37–47.

  • Morgan, J. P. and Reck, B. H. (2007), “Resolvable Designs with Large Blocks,” Annals of Statistics, 35, 747–771.