Role
|
Description
|
---|---|
Stratify
by
|
specifies the variables
to use to partition the input table into mutually exclusive, nonoverlapping
subsets that are known as strata. Each stratum is defined by a set
of values of the strata variables, and each stratum is sampled separately.
The complete sample is the union of the samples that are taken from
all the strata.
Note: If you do not assign any
variables to this role, then the entire input table is treated as
a single stratum.
You can allocate the
total sample size among the strata in proportion to the size of the
stratum. For example, the variable GENDER has possible values of M
and F, and the variable VOTED has possible values of Y and N. If you
assign both GENDER and VOTED to the Stratify by role,
then the input table is partitioned into four strata: males who voted,
males who did not vote, females who voted, and females who did not
vote.
The input table contains
20,000 rows, and the values are distributed as follows:
|
Stratify
by (continued)
|
Therefore, the proportion
of males who voted is 7,000/20,000=0.35 or 35%. The proportions in
the sample should reflect the proportions of the strata in the input
table. For example, if your sample table contains 100 observations,
then 35% of the values in the sample must be selected from the males
who voted stratum to reflect the proportions in the input table.
|
Option Name
|
Description
|
---|---|
Methods
|
|
Sample by
|
specifies the sample
size in the desired number of rows or in the desired percentage of
input rows. For example, if you specify 3% of rows and there are 400
input rows, then the resulting sample has 12 rows.
Note: If you assign variables to
the Stratify by role, then the sample size
specification that you make here applies to each stratum rather than
to the entire input table.
|
Random seed
|
specifies the initial
seed for the generation of random numbers. If you set this value to
zero or a negative number, then a seed that is based on the system
clock is used to produce the sample.
|
Ignore case
of character stratification values
|
distinguishes stratified
variables that share the same normalized value when you perform stratified
sampling. For example, if a target has three distinct values, “A”,
“B”, and “b”, and you want to treat “B”
and “b” as different levels, you need to select this
option. Otherwise, “B” and “b” are treated
as the same level. The task normalizes a value as follows:
|