The SURVEYSELECT Procedure

Example 95.1 Replicated Sampling

This example uses the Customers data set from the section Getting Started: SURVEYSELECT Procedure. The data set Customers contains an Internet service provider’s current subscribers, and the service provider wants to select a sample from this population for a customer satisfaction survey.

This example illustrates replicated sampling, which selects multiple samples from the survey population according to the same design. You can use replicated sampling to provide a simple method of variance estimation, or to evaluate variable nonsampling errors such as interviewer differences. For information about replicated sampling, see Lohr (2010); Wolter (2007); Kish (1965, 1987); Kalton (1983).

This design includes four replicates, each with a sample size of 50 customers. The sampling frame is stratified by State and sorted by Type and Usage within strata. Customers are selected by sequential random sampling with equal probability within strata. The following PROC SURVEYSELECT statements select a probability sample of customers from the Customers data set by using this design:

title1 'Customer Satisfaction Survey';
title2 'Replicated Sampling';
proc surveyselect data=Customers method=seq n=(8 12 20 10)
                  reps=4 seed=40070 ranuni out=SampleRep;
   strata State;
   control Type Usage;
run;

The STRATA statement names the stratification variable State. The CONTROL statement names the control variables Type and Usage.

In the PROC SURVEYSELECT statement, the METHOD=SEQ option requests sequential random sampling. The REPS= option specifies four replicates of this sample. The N=(8 12 20 10) option lists the stratum sample sizes for each replicate. The N= option lists the stratum sample sizes in the same order as the strata appear in the Customers data set, which has been sorted by State. The sample size of eight customers corresponds to the first stratum, State = 'AL'. The sample size 12 corresponds to the next stratum, State = 'FL', and so on.

The SEED= option specifies 40070 as the initial seed for random number generation. The RANUNI option requests random number generation by the RANUNI generator, which PROC SURVEYSELECT used in releases prior to SAS/STAT 12.1. (Beginning in SAS/STAT 12.1, PROC SURVEYSELECT uses the Mersenne-Twister random number generator by default.) You can specify the RANUNI option with the SEED= option to reproduce samples that PROC SURVEYSELECT selected in releases prior to SAS/STAT 12.1. To reproduce a sample by using the RANUNI and SEED= options, you must also specify the same input data set and sample selection parameters.

Output 95.1.1 displays the output from PROC SURVEYSELECT, which summarizes the sample selection. A total of 200 customers is selected in four replicates. PROC SURVEYSELECT selects each replicate by using sequential random sampling within strata determined by State. The sampling frame Customers is sorted by the control variables Type and Usage within strata, according to hierarchic serpentine sorting. The output data set SampleRep contains the sample.

Output 95.1.1: Sample Selection Summary

Customer Satisfaction Survey
Replicated Sampling

The SURVEYSELECT Procedure

Selection Method Sequential Random Sampling
  With Equal Probability
Strata Variable State
Control Variables Type
  Usage
Control Sorting Serpentine

Input Data Set CUSTOMERS
Random Number Seed 40070
Number of Strata 4
Number of Replicates 4
Total Sample Size 200
Output Data Set SAMPLEREP


The following PROC PRINT statements display the selected customers for the first stratum, State = 'AL', from the output data set SampleRep:

title1 'Customer Satisfaction Survey';
title2 'Sample Selected by Replicated Design';
title3 '(First Stratum)';
proc print data=SampleRep;
   where State = 'AL';
run;

Output 95.1.2 displays the 32 sample customers of the first stratum (State = 'AL') from the output data set SampleRep, which includes the entire sample of 200 customers. The variable SelectionProb contains the selection probability, and SamplingWeight contains the sampling weight. Because customers are selected with equal probability within strata in this design, all customers in the same stratum have the same selection probability. These selection probabilities and sampling weights apply to a single replicate, and the variable Replicate contains the sample replicate number.

Output 95.1.2: Customer Sample (First Stratum)

Customer Satisfaction Survey
Sample Selected by Replicated Design
(First Stratum)

Obs State Replicate CustomerID Type Usage SelectionProb SamplingWeight
1 AL 1 882-37-7496 New 572 .004115226 243
2 AL 1 581-32-5534 New 863 .004115226 243
3 AL 1 980-29-2898 Old 571 .004115226 243
4 AL 1 172-56-4743 Old 128 .004115226 243
5 AL 1 998-55-5227 Old 35 .004115226 243
6 AL 1 625-44-3396 New 60 .004115226 243
7 AL 1 627-48-2509 New 114 .004115226 243
8 AL 1 257-66-6558 New 172 .004115226 243
9 AL 2 622-83-1680 New 22 .004115226 243
10 AL 2 343-57-1186 New 53 .004115226 243
11 AL 2 976-05-3796 New 110 .004115226 243
12 AL 2 859-74-0652 New 303 .004115226 243
13 AL 2 476-48-1066 New 839 .004115226 243
14 AL 2 109-27-8914 Old 2102 .004115226 243
15 AL 2 743-25-0298 Old 376 .004115226 243
16 AL 2 722-08-2215 Old 105 .004115226 243
17 AL 3 668-57-7696 New 200 .004115226 243
18 AL 3 300-72-0129 New 471 .004115226 243
19 AL 3 073-60-0765 New 656 .004115226 243
20 AL 3 526-87-0258 Old 672 .004115226 243
21 AL 3 726-61-0387 Old 150 .004115226 243
22 AL 3 632-29-9020 Old 51 .004115226 243
23 AL 3 417-17-8378 New 56 .004115226 243
24 AL 3 091-26-2366 New 93 .004115226 243
25 AL 4 336-04-1288 New 419 .004115226 243
26 AL 4 827-04-7407 New 650 .004115226 243
27 AL 4 317-70-6496 Old 452 .004115226 243
28 AL 4 002-38-4582 Old 206 .004115226 243
29 AL 4 181-83-3990 Old 33 .004115226 243
30 AL 4 675-34-7393 New 47 .004115226 243
31 AL 4 228-07-6671 New 65 .004115226 243
32 AL 4 298-46-2434 New 161 .004115226 243