Role
|
Description
|
---|---|
Stratify
by
|
specifies the variables
to use to partition the input table into mutually exclusive, nonoverlapping
subsets that are known as strata. Each stratum is defined by a set
of values of the strata variables, and each stratum is sampled separately.
The complete sample is the union of the samples that are taken from
all the strata.
Note: If you do not assign any
variables to this role, then the entire input table is treated as
a single stratum.
You can allocate the
total sample size among the strata in proportion to the size of the
stratum. For example, the variable GENDER has possible values of M
and F, and the variable VOTED has possible values of Y and N. If you
assign both GENDER and VOTED to the Stratify by role,
then the input table is partitioned into four strata: males who voted,
males who did not vote, females who voted, and females who did not
vote.
The input table contains
20,000 rows, and the values are distributed as follows:
Therefore, the proportion
of males who voted is 7,000/20,000=0.35 or 35%. The proportions in
the sample should reflect the proportions of the strata in the input
table. For example, if your sample table contains 100 observations,
then 35% of the values in the sample must be selected from the males
who voted stratum to reflect the proportions in the input table.
|
Ignore case
of character stratification values
|
distinguishes stratified
variables that share the same normalized value when you perform stratified
sampling. For example, if a target has three distinct values, “A”,
“B”, and “b”, and you want to treat “B”
and “b” as different levels, you need to select this
option. Otherwise, “B” and “b” are treated
as the same level. The task normalizes a value as follows:
|