Role
|
Description
|
---|---|
Stratify
by
|
specifies the variables to use to partition the input table into mutually exclusive, nonoverlapping subsets that are known as strata. Each stratum is defined by a set of values of the strata variables, and each stratum is sampled separately. The complete sample is the union of the
samples that are taken from all the strata.
Note: If you do not assign any
variables to this role, then the entire input table is treated as
a single stratum.
You can allocate the total sample size among the strata in proportion to the size of the stratum. For example, the variable
GENDER has possible values of M and F, and the variable VOTED has possible values
of Y and N. If you assign both GENDER and VOTED to the Stratify by role, then the input table is partitioned into four strata: males who voted, males who did not vote, females who voted, and
females who did not vote.
The input table contains
20,000 rows, and the values are distributed as follows:
|
Stratify
by (continued)
|
Therefore, the proportion of males who voted is 7,000/20,000=0.35 or 35%. The proportions
in the sample should reflect the proportions of the strata in the input table. For
example, if your sample table contains 100 observations, then 35% of the values in the sample must be selected from the males who voted stratum
to reflect the proportions in the input table.
|
Option Name
|
Description
|
---|---|
Methods
|
|
Sample by
|
specifies the sample size in the desired number of rows or in the desired percentage
of input rows. For example,
if you specify 3% of rows and there are 400 input rows, then the resulting sample
has 12 rows.
Note: If you assign variables to
the Stratify by role, then the sample size
specification that you make here applies to each stratum rather than
to the entire input table.
|
Random seed
|
specifies the initial seed for the generation of random numbers. If you set this value to zero or a negative number, then a seed that is based on the system clock is used to produce the sample.
|
Ignore case
of character stratification values
|
distinguishes stratified variables that share the same normalized value when you perform stratified sampling. For example, if a target has three distinct values, “A”, “B”, and “b”, and you want to treat “B” and “b” as different levels, you need
to select this option. Otherwise, “B” and “b” are treated as the same level. The task
normalizes a value as follows:
|