Assigning Data to Roles

To run the Random Sampling task, you must select an input data set. To filter the input data source, click Filter Icon.
If you want to perform stratified sampling, you must assign a column to the Stratify by role. Otherwise, the Stratify by role is optional.
Role
Description
Stratify by
specifies the variables to use to partition the input table into mutually exclusive, nonoverlapping subsets that are known as strata. Each stratum is defined by a set of values of the strata variables, and each stratum is sampled separately. The complete sample is the union of the samples that are taken from all the strata.
Note: If you do not assign any variables to this role, then the entire input table is treated as a single stratum.
You can allocate the total sample size among the strata in proportion to the size of the stratum. For example, the variable GENDER has possible values of M and F, and the variable VOTED has possible values of Y and N. If you assign both GENDER and VOTED to the Stratify by role, then the input table is partitioned into four strata: males who voted, males who did not vote, females who voted, and females who did not vote.
The input table contains 20,000 rows, and the values are distributed as follows:
  • 7,000 males who voted
  • 4,000 males who did not vote
  • 5,000 females who voted
  • 4,000 females who did not vote
Therefore, the proportion of males who voted is 7,000/20,000=0.35 or 35%. The proportions in the sample should reflect the proportions of the strata in the input table. For example, if your sample table contains 100 observations, then 35% of the values in the sample must be selected from the males who voted stratum to reflect the proportions in the input table.
Ignore case of character stratification values
distinguishes stratified variables that share the same normalized value when you perform stratified sampling. For example, if a target has three distinct values, “A”, “B”, and “b”, and you want to treat “B” and “b” as different levels, you need to select this option. Otherwise, “B” and “b” are treated as the same level. The task normalizes a value as follows:
  1. Leading blanks are removed.
  2. The value is truncated to 32 characters.
  3. Letters are changed from lowercase to uppercase.