Role
|
Description
|
---|---|
Output columns
|
specifies the variables
to include in the output table. By default, all variables are included
in the output table. However, you can select the variables to include
in the output.
|
Strata columns
|
specifies the variables
to use to partition the input table into mutually exclusive, nonoverlapping
subsets that are known as strata. Each stratum is defined by a set
of values of the strata variables, and each stratum is sampled separately.
The complete sample is the union of the samples that are taken from
all the strata.
Note: If you do not assign any
variables to this role, then the entire input table is treated as
a single stratum.
You can allocate the
total sample size among the strata in proportion to the size of the
stratum. For example, the variable GENDER has possible values of M
and F, and the variable VOTED has possible values of Y and N. If you
assign both GENDER and VOTED to the Strata columns role,
then the input table is partitioned into four strata: males who voted,
males who did not vote, females who voted, and females who did not
vote.
The input table contains
20,000 rows, and the values are distributed as follows:
Therefore, the proportion
of males who voted is 7,000/20,000=0.35 or 35%. The proportions in
the sample should reflect the proportions of the strata in the input
table. For example, if your sample table contains 100 observations,
then 35% of the values in the sample must be selected from the males
who voted stratum to reflect the proportions in the input table.
|
Option Name
|
Description
|
---|---|
Sample size
|
specifies the sample
size in the desired number of rows or in the desired percentage of
input rows. For example, if you specify 3% of rows and there are 400
input rows, then the resulting sample has 12 rows.
Note: If you assign variables to
the Strata columns role, then the sample
size specification that you make here applies to each stratum rather
than to the entire input table.
|
Sample method
|
specifies the method
to use when sampling the data. Here are the valid values:
Simple (no duplicates)
specifies the simple
method when sampling the input data. When a row is selected, it is
removed from eligibility for subsequent selections. This makes it
impossible to select the same row more than once.
Unrestricted (duplicates allowed)
specifies the unrestricted
method when sampling the input data. When a row is selected, it remains
eligible for subsequent selections. This makes it possible to select
the same row more than once. You can specify how multiple selections
of the same row are recorded in the output table.
You can choose from
the following options:
Show each observations once in output (exclude duplicates)
a row that is selected n times
occurs in the sample once. In the output, the NumberHits variable
(which is calculated automatically by the Random Sample task) lists
the number of times that the observation occurred in the input table.
Show all observations in output (include duplicates)
a row that is selected n times
occurs in the sample n times.
|
Location
of output data set
|
specifies the name and
location for the output data. By default, the data is saved to the
Work library.
|
Random seed
number
|
specifies the initial
seed for the generation of random numbers. If you do not specify a
random seed number, then a seed that is based on the system clock
will be used to produce the sample.
|
Generate
a sample selection summary
|
generates a summary
table that includes the seed that was used to produce the sample.
By specifying this same seed later with the same input table, you
can reproduce the same sample.
|