Role
|
Description
|
---|---|
Roles
|
|
Output columns
|
specifies the variables to include in the output table. By default, all variables
are included in the output table. However, you can select
the variables to include in the output.
|
Strata columns
|
specifies the variables to use to partition the input table into mutually exclusive,
nonoverlapping subsets that are known as strata. Each stratum is defined by a set of values of the strata variables, and each stratum is sampled separately. The complete sample is the union of the
samples that are taken from all the strata.
Note: If you do not assign any
variables to this role, then the entire input table is treated as
a single stratum.
This example shows how the total sample size among the strata in proportion to the
size of the stratum. For this example, the
variable GENDER has possible values of M and F, and the variable VOTED has possible
values of Y and N. If you assign both GENDER and VOTED to the Strata columns role, then the input table is partitioned into four strata: males who voted, males who did not vote, females who voted, and
females who did not vote.
The input table contains
20,000 rows, and the values are distributed as follows:
Therefore, the proportion of males who voted is 7,000/20,000=0.35 or 35%. The proportions
in the sample should reflect the proportions of the strata in the input table. For
example, if your sample table contains 100 observations, then 35% of the values in the sample must be selected from the males who voted stratum
to reflect the proportions in the input table.
|
Output Data Set
|
|
Data set
name
|
specifies the name of the output data set.
|
Option Name
|
Description
|
---|---|
Sample size and Sample
percent
|
specifies the sample size in the desired number of rows or in the desired percentage
of input rows. For example,
if you specify 3% of rows and there are 400 input rows, then the resulting sample
has 12 rows.
Note: If you assign variables to
the Strata columns role, then the sample
size specification that you make here applies to each stratum rather
than to the entire input table.
|
Sample method
|
specifies the method
to use when sampling the data. Here are the valid values:
Simple (no duplicates)
specifies the simple
method when sampling the input data. When a row is selected, it is
removed from eligibility for subsequent selections. This makes it
impossible to select the same row more than once.
Unrestricted (duplicates allowed)
specifies the unrestricted method when sampling the input data. When a row is selected,
it remains eligible for subsequent selections. This makes it possible to select the
same row more than once. You can specify how multiple selections of the same row are
recorded in the output table.
You can choose from
the following options:
Show each observations once in output (exclude duplicates)
a row that is selected n times occurs in the sample once. In the output, the NumberHits variable (which is
calculated automatically by the Select Random Sample task) lists the number of times
that the observation occurred in the input table.
Show all observations in output (include duplicates)
a row that is selected n times
occurs in the sample n times.
|
Random seed
number
|
specifies the initial seed for the generation of random numbers. If you do not specify a random seed number, then a seed that is based on the system clock will be used to produce the sample.
|
Generate
a sample selection summary
|
generates a summary table that includes the seed that was used to produce the sample.
By specifying this same seed later with the
same input table, you can reproduce the same sample.
|