Output Data Sets |
To understand how PROC PLAN creates output data sets, you need to look at how the procedure represents a plan. A plan is a list of values for all the factors, the values being chosen according to the factor-selection requests you specify. For example, consider the plan produced by the following statements:
proc plan seed=12345; factors a=3 b=2; run;
The plan as displayed by PROC PLAN is shown in Figure 67.6.
Factor | Select | Levels | Order |
---|---|---|---|
a | 3 | 3 | Random |
b | 2 | 2 | Random |
a | b | |
---|---|---|
2 | 2 | 1 |
1 | 1 | 2 |
3 | 2 | 1 |
The first cell of the plan has a=2 and b=2, the second has a=2 and b=1, the third has a=1 and b=1, and so on. If you output the plan to a data set with the OUTPUT statement, by default the output data set contains a numeric variable with that factor’s name; the values of this numeric variable are the numbers of the successive levels selected for the factor in the plan. For example, the following statements produce Figure 67.7.
proc plan seed=12345; factors a=3 b=2; output out=out; proc print data=out; run;
Obs | a | b |
---|---|---|
1 | 2 | 2 |
2 | 2 | 1 |
3 | 1 | 1 |
4 | 1 | 2 |
5 | 3 | 2 |
6 | 3 | 1 |
Alternatively, you can specify the values that are output for a factor with the CVALS= or NVALS= option. Also, you can specify that the internal values be associated with the output values in random order with the RANDOM option. See the section OUTPUT Statement.
If you also specify an input data set (DATA=), each factor is associated with a variable in the DATA= data set. This occurs either implicitly by the factor and variable having the same name or explicitly as described in the specifications for the OUTPUT statement. In this case, the values of the variables corresponding to the factors are first read and then interpreted as describing the position of a cell in the plan. Then the respective values taken by the factors at that position are assigned to the variables in the OUT= data set. For example, consider the data set defined by the following statements.
data in; input a b; datalines; 1 1 2 1 3 1 ;
Suppose you specify this data set as an input data set for the OUTPUT statement.
proc plan seed=12345; factors a=3 b=2; output out=out data=in; proc print data=out; run;
PROC PLAN interprets the first observation as referring to the cell in the first row and column of the plan, since a=1 and b=1; likewise, the second observation is interpreted as the cell in the second row and first column, and the third observation as the cell in the third row and first column. In the output data set, a and b have the values they have in the plan at these positions, as shown in Figure 67.8.
Obs | a | b |
---|---|---|
1 | 2 | 2 |
2 | 1 | 1 |
3 | 3 | 2 |
When the factors are random, this has the effect of randomizing the input data set in the same manner as the plan produced (see the sections Randomizing Designs and Randomly Assigning Subjects to Treatments).