This example uses the Customers data set from the section Getting Started: SURVEYSELECT Procedure. The data set Customers contains an Internet service provider’s current subscribers, and the service provider wants to select a sample from this population for a customer satisfaction survey.
This example illustrates replicated sampling, which selects multiple samples from the survey population according to the same design. You can use replicated sampling to provide a simple method of variance estimation, or to evaluate variable nonsampling errors such as interviewer differences. See Lohr (2010), Wolter (2007), Kish (1965, 1987), and Kalton (1983) for information about replicated sampling.
This design includes four replicates, each with a sample size of 50 customers. The sampling frame is stratified by State and sorted by Type and Usage within strata. Customers are selected by sequential random sampling with equal probability within strata. The following PROC SURVEYSELECT statements select a probability sample of customers from the Customers data set by using this design:
title1 'Customer Satisfaction Survey'; title2 'Replicated Sampling'; proc surveyselect data=Customers method=seq n=(8 12 20 10) reps=4 seed=40070 out=SampleRep; strata State; control Type Usage; run;
The STRATA statement names the stratification variable State. The CONTROL statement names the control variables Type and Usage. In the PROC SURVEYSELECT statement, the METHOD=SEQ option requests sequential random sampling. The REPS=4 option specifies four replicates of this sample. The N=(8 12 20 10) option lists the stratum sample sizes for each replicate. The N= option lists the stratum sample sizes in the same order as the strata appear in the Customers data set, which has been sorted by State. The sample size of eight customers corresponds to the first stratum, State = 'AL'. The sample size 12 corresponds to the next stratum, State = 'FL', and so on. The SEED=40070 option specifies '40070' as the initial seed for random number generation.
Output 91.1.1 displays the output from PROC SURVEYSELECT, which summarizes the sample selection. A total of 200 customers is selected in four replicates. PROC SURVEYSELECT selects each replicate by using sequential random sampling within strata determined by State. The sampling frame Customers is sorted by the control variables Type and Usage within strata, according to hierarchic serpentine sorting. The output data set SampleRep contains the sample.
|Customer Satisfaction Survey|
|Selection Method||Sequential Random Sampling|
|With Equal Probability|
|Input Data Set||CUSTOMERS|
|Random Number Seed||40070|
|Number of Strata||4|
|Number of Replicates||4|
|Total Sample Size||200|
|Output Data Set||SAMPLEREP|
The following PROC PRINT statements display the selected customers for the first stratum, State = 'AL', from the output data set SampleRep:
title1 'Customer Satisfaction Survey'; title2 'Sample Selected by Replicated Design'; title3 '(First Stratum)'; proc print data=SampleRep; where State = 'AL'; run;
Output 91.1.2 displays the 32 sample customers of the first stratum (State = 'AL') from the output data set SampleRep, which includes the entire sample of 200 customers. The variable SelectionProb contains the selection probability, and SamplingWeight contains the sampling weight. Because customers are selected with equal probability within strata in this design, all customers in the same stratum have the same selection probability. These selection probabilities and sampling weights apply to a single replicate, and the variable Replicate contains the sample replicate number.
|Customer Satisfaction Survey|
|Sample Selected by Replicated Design|